Concatenate lists together - python

I have a dictionary where the value elements are lists:
d1={'A': [], 'C': ['SUV'], 'B': []}
I need to concatenate the values into a single list ,only if the list is non-empty.
Expected output:
o=['SUV']
Help is appreciated.

from itertools import chain
d1={'A': [], 'C': ['SUV'], 'B': []}
print list(chain.from_iterable(d1.itervalues()))

>>> d1 = {'A': [], 'C': ['SUV'], 'B': []}
>>> [ele for lst in d1.itervalues() for ele in lst]
['SUV']

You can use itertools.chain, but the order can be arbitrary as dicts are unordered collection. So may have have to sort the dict based on keys or values to get the desired result.
>>> d1={'A': [], 'C': ['SUV'], 'B': []}
>>> from itertools import chain
>>> list(chain(*d1.values())) # or use d1.itervalues() as it returns an iterator(memory efficient)
['SUV']

>>> from operator import add
>>> d1={'A': [], 'C': ['SUV'], 'B': []}
>>> reduce(add,d1.itervalues())
['SUV']
Or more comprehensive example:
>>> d2={'A': ["I","Don't","Drive"], 'C': ['SUV'], 'B': ["Boy"]}
>>> reduce(add,d2.itervalues())
['I', "Don't", 'Drive', 'SUV', 'Boy']

Related

Get following dict structure in Python? [duplicate]

I want to generate a dictionary from a list of dictionaries, grouping list items by the value of some key, such as:
input_list = [
{'a':'tata', 'b': 'foo'},
{'a':'pipo', 'b': 'titi'},
{'a':'pipo', 'b': 'toto'},
{'a':'tata', 'b': 'bar'}
]
output_dict = {
'pipo': [
{'a': 'pipo', 'b': 'titi'},
{'a': 'pipo', 'b': 'toto'}
],
'tata': [
{'a': 'tata', 'b': 'foo'},
{'a': 'tata', 'b': 'bar'}
]
}
So far I've found two ways of doing this. The first simply iterates over the list, create sublists in the dict for each key value and append elements matching these keys to the sublist:
l = [
{'a':'tata', 'b': 'foo'},
{'a':'pipo', 'b': 'titi'},
{'a':'pipo', 'b': 'toto'},
{'a':'tata', 'b': 'bar'}
]
res = {}
for e in l:
res[e['a']] = res.get(e['a'], [])
res[e['a']].append(e)
And another using itertools.groupby:
import itertools
from operator import itemgetter
l = [
{'a':'tata', 'b': 'foo'},
{'a':'pipo', 'b': 'titi'},
{'a':'pipo', 'b': 'toto'},
{'a':'tata', 'b': 'bar'}
]
l = sorted(l, key=itemgetter('a'))
res = dict((k, list(g)) for k, g in itertools.groupby(l, key=itemgetter('a')))
I wonder which alternative is the most efficient?
Is there any more pythonic/concise or better performing way of achieving this?
Is it correct that you want to group your input list by the value of the 'a' key of the list elements? If so, your first approach is the best, one minor improvement, use dict.setdefault:
res = {}
for item in l:
res.setdefault(item['a'], []).append(item)
If by efficient you mean "time efficient", it is possible to measure it using the timeit built in module.
For example:
import timeit
import itertools
from operator import itemgetter
input = [{'a': 'tata', 'b': 'foo'},
{'a': 'pipo', 'b': 'titi'},
{'a': 'pipo', 'b': 'toto'},
{'a': 'tata', 'b': 'bar'}]
def solution1():
res = {}
for e in input:
res[e['a']] = res.get(e['a'], [])
res[e['a']].append(e)
return res
def solution2():
l = sorted(input, key=itemgetter('a'))
res = dict(
(k, list(g)) for k, g in itertools.groupby(l, key=itemgetter('a'))
)
return res
t = timeit.Timer(solution1)
print(t.timeit(10000))
# 0.0122511386871
t = timeit.Timer(solution2)
print(t.timeit(10000))
# 0.0366218090057
Please refer to the timeit official docs for further information.
A one liner -
>>> import itertools
>>> input_list = [
... {'a':'tata', 'b': 'foo'},
... {'a':'pipo', 'b': 'titi'},
... {'a':'pipo', 'b': 'toto'},
... {'a':'tata', 'b': 'bar'}
... ]
>>> {k:[v for v in input_list if v['a'] == k] for k, val in itertools.groupby(input_list,lambda x: x['a'])}
{'tata': [{'a': 'tata', 'b': 'foo'}, {'a': 'tata', 'b': 'bar'}], 'pipo': [{'a': 'pipo', 'b': 'titi'}, {'a': 'pipo', 'b': 'toto'}]}
The best approach is the first one you mentioned, and you can even make it more elegant by using setdefault as mentioned by bernhard above. The complexity of this approach is O(n) since we simply iterate over the input once and for each item we perform a lookup into the output dict we are building to find the appropriate list to append it to, which takes constant time (lookup+append) for each item. So overlal complexity is O(n) which is optimal.
When using itertools.groupby, you must sort the input beforehand (which is O(n log n)).

Python how to use defaultdict fromkeys to generate a dictionary with predefined keys and empty lists

Here is the code:
from collections import defaultdict
result = defaultdict.fromkeys(['a','b','c'], list)
result['a'].append(1)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-44-6c01c8d56a42> in <module>()
----> 1 result['a'].append('1')
TypeError: descriptor 'append' requires a 'list' object but received a 'str'
I don't understand the error message, what went wrong and how to fix it?
The .fromkeys method is primarily used to set a dict to a single same default value:
>>> {}.fromkeys(['a','b','c'])
{'a': None, 'c': None, 'b': None}
It does not call a function for each key (which is what list or [] is).
Default dict needs a 'factory function' to be a default dict:
>>> from collections import defaultdict
>>> result=defaultdict(list)
>>> result
defaultdict(<type 'list'>, {})
The factory function (in this case, list) is called any time a missing key is added to a defaultdict to form the default value.
So to set three lists with keys 'a','b','c' you would do:
>>> for e in ('a','b','c'):
... result[e] # 'e' is not in the dict, so it is added
# and a new list is the value
>>> result
defaultdict(<type 'list'>, {'a': [], 'c': [], 'b': []})
Or, you can use the .update method as Raymond Hettinger points out:
>>> result=defaultdict(list)
>>> result.update((k,[]) for k in 'abc')
>>> result
defaultdict(<type 'list'>, {'a': [], 'c': [], 'b': []})
Or, as ivan_pozdeev points out in comments, you can do:
>>> di=defaultdict(list,{k:[] for k in 'abc'})
>>> di
defaultdict(<type 'list'>, {'a': [], 'c': [], 'b': []})
Or you can use a regular Python dict with a dict comprehension to get the same thing -- no defaultdict required -- if you only need or want those three keys with unique lists as their values:
>>> di={k:[] for k in 'abc'}
>>> di
{'a': [], 'c': [], 'b': []}
And those are separate lists for each key:
>>> di['a'].append(1)
>>> di
{'a': [1], 'c': [], 'b': []}
A common mistake (which I have made ;-0) is to use something like .fromkeys and get the same list referred to by multiple keys:
>>> di2={}.fromkeys(['a','b','c'], [])
>>> di2
{'a': [], 'c': [], 'b': []} # looks ok
>>> di2['a'].append('WRONG!')
>>> di2
{'a': ['WRONG!'], 'c': ['WRONG!'], 'b': ['WRONG!']}
Happens because the same single list is referred to by all the keys.
Unfortunately, fromkeys() is for setting the same value over and over again. It isn't helpful when you need distinct lists.
So, I would tackle the problem like this:
>>> keys = ['barry', 'tim', 'fredrik', 'alex']
>>> d = defaultdict(list)
>>> d.update((k, []) for k in keys)
>>> d
defaultdict(<class 'list'>, {'barry': [], 'tim': [], 'fredrik': [], 'alex': []})

Using list comprehension to add one to a value in a dictionary

I'm trying to get rid of this for loop and instead use list comprehension to give the same result.
fd= nltk.FreqDist()
html = requests.get("http://www.nrc.nl/nieuws/2015/04/19/louise-gunning-vertrekt-als-voorzitter-bestuur-uva/")
raw = BeautifulSoup(html.text).text
for word in nltk.word_tokenize(raw):
freqdist[word.lower()] += 1
I'm not sure if it's possible, but I can't get it to work because of the +=1. I've tried:
[freqdist[word.lower()] +=1 for word in nltk.word_tokenize(raw)]
But that will only raise an error. Could anyone point me in the right direction?
If you want to mutate an existing list/dictionary, using a list/dictionary comprehension is considered bad style because it creates an unnecessary throwaway-list/dictionary.
To be precise, I'm talking about the following:
>>> demo = ['a', 'b', 'c']
>>> freqdist = {'a': 0, 'b': 1, 'c': 2}
>>> [freqdist.__setitem__(key, freqdist[key] + 1) for key in demo]
[None, None, None]
>>> freqdist
{'a': 1, 'c': 3, 'b': 2}
As you can see, doing what you describe is possible, but that's not how you should do it because
it is hard to read
it creates an unused throwaway list [None, None, None]
list comprehensions should be used to build a new list that you actually need
Creating a new dictionary with a dictionary comprehension is cumbersome as well, because not every value should be incremented (only the ones in demo).
You could do
>>> demo = ['a', 'b', 'c']
>>> freqdist = {'a': 0, 'b': 1, 'c': 2}
>>> freqdist = {k:v + (k in demo) for k,v in freqdist.items()}
>>> freqdist
{'a': 1, 'c': 3, 'b': 2}
However, we have suboptimal runtime complexity now because for each key in freqdist we do a O(len(demo)) membership test for demo.
You could use a set for demo to reduce the complexity of the dictionary building to O(len(freqdist)), but only if the elements of demo are unique.
>>> demo = set(['a', 'b', 'c'])
>>> freqdist = {'a': 0, 'b': 1, 'c': 2}
>>> freqdist = {k:v + (k in demo) for k,v in freqdist.items()}
>>> freqdist
{'a': 1, 'c': 3, 'b': 2}
I don't think this solution is particularly elegant, either.
In conclusion, your for loop is perfectly fine. The only good alternative would be to use a Counter object that you update:
>>> from collections import Counter
>>> demo = ['a', 'b', 'c']
>>> freqdist = Counter({'a': 0, 'b': 1, 'c': 2})
>>> freqdist.update(demo)
>>> freqdist
Counter({'c': 3, 'b': 2, 'a': 1})
This is the solution I would use personally.
This works:
>>> txt = 'Hello goodbye hello GDby Dog cat dog'
>>> txt_new = txt.lower().split()
>>> print txt_new
['hello', 'goodbye', 'hello', 'gdby', 'dog', 'cat', 'dog']
Now use collections
>>> import collections
>>> collections.Counter(txt_new)
Counter({'hello': 2, 'dog': 2, 'gdby': 1, 'cat': 1, 'goodbye': 1})
If you are not allowed to use collections.Counter then:
>>> {word: txt_new.count(word) for word in set(txt_new)}
{'goodbye': 1, 'dog': 2, 'hello': 2, 'gdby': 1, 'cat': 1}

How to initialize defaultdict with keys?

I have a dictionary of lists, and it should be initialized with default keys. I guess, the code below is not good (I mean, it works, but I don't feel that it is written in the pythonic way):
d = {'a' : [], 'b' : [], 'c' : []}
So I want to use something more pythonic like defaultict:
d = defaultdict(list)
However, every tutorial that I've seen dynamically sets the new keys. But in my case all the keys should be defined from the start. I'm parsing other data structures, and I add values to my dictionary only if specific key in the structure also contains in my dictionary.
How can I set the default keys?
From the comments, I'm assuming you want a dictionary that fits the following conditions:
Is initialized with set of keys with an empty list value for each
Has defaultdict behavior that can initialize an empty list for non-existing keys
#Aaron_lab has the right method, but there's a slightly cleaner way:
d = defaultdict(list,{ k:[] for k in ('a','b','c') })
That's already reasonable but you can shorten that up a bit with a dict comprehension that uses a standard list of keys.
>>> standard_keys = ['a', 'b', 'c']
>>> d1 = {key:[] for key in standard_keys}
>>> d2 = {key:[] for key in standard_keys}
>>> ...
If you're going to pre-initialize to empty lists, there is no need for a defaultdict. Simple dict-comprehension gets the job done clearly and cleanly:
>>> {k : [] for k in ['a', 'b', 'c']}
{'a': [], 'b': [], 'c': []}
If you have a close set of keys (['a', 'b', 'c'] in your example)
you know you'll use, you can definitely use the answers above.
BUT...
dd = defaultdict(list) gives you much more then: d = {'a':[], 'b':[], 'c':[]}.
You can append to "not existing" keys in defaultdict:
>>dd['d'].append(5)
>>dd
>>defaultdict(list, {'d': 5})
where if you do:
>>d['d'].append(5) # you'll face KeyError
>>KeyError: 'd'
Recommend to do something like:
>>d = {'a' : [], 'b' : [], 'c' : []}
>>default_d = defaultdict(list, **d)
now you have a dict holding your 3 keys: ['a', 'b', 'c'] and empty lists as values, and you can also append to other keys without explicitly writing: d['new_key'] = [] before appending
You can have a function defined which will return you a dict with preset keys.
def get_preset_dict(keys=['a','b','c'],values=None):
d = {}
if not values:
values = [[]]*len(keys)
if len(keys)!=len(values):
raise Exception('unequal lenghts')
for index,key in enumerate(keys):
d[key] = values[index]
return d
In [8]: get_preset_dict()
Out[8]: {'a': [], 'b': [], 'c': []}
In [18]: get_preset_dict(keys=['a','e','i','o','u'])
Out[18]: {'a': [], 'e': [], 'i': [], 'o': [], 'u': []}
In [19]:
get_preset_dict(keys=['a','e','i','o','u'],values=[[1],[2,2,2],[3],[4,2],[5]])
Out[19]: {'a': [1], 'e': [2, 2, 2], 'i': [3], 'o': [4, 2], 'u': [5]}
from collections import defaultdict
list(map((data := defaultdict(list)).__getitem__, 'abcde'))
data
Out[3]: defaultdict(list, {'a': [], 'b': [], 'c': [], 'd': [], 'e':
[]})

Adding multiple key,value pair using dictionary comprehension

for a list of dictionaries
sample_dict = [
{'a': 'woot', 'b': 'nope', 'c': 'duh', 'd': 'rough', 'e': '1'},
{'a': 'coot', 'b': 'nope', 'c': 'ruh', 'd': 'rough', 'e': '2'},
{'a': 'doot', 'b': 'nope', 'c': 'suh', 'd': 'rough', 'e': '3'},
{'a': 'soot', 'b': 'nope', 'c': 'fuh', 'd': 'rough', 'e': '4'},
{'a': 'toot', 'b': 'nope', 'c': 'cuh', 'd': 'rough', 'e': '1'}
]
How do I make a separate dictionary that contains all the key,value pair that match to a certain key. With list comprehension I created a list of all the key,value pairs like this:
container = [[key,val] for s in sample_dict for key,val in s.iteritems() if key == 'a']
Now the container gave me
[['a', 'woot'], ['a', 'coot'], ['a', 'doot'], ['a', 'soot'], ['a', 'toot']]
Which is all fine... but if I want to do the same with dictionaries, I get only a singe key,value pair. Why does this happen ?
container = {key : val for s in sample_dict for key,val in s.iteritems() if key == 'a'}
The container gives only a single element
{'a': 'toot'}
I want the something like
{'a': ['woot','coot','doot','soot','toot']}
How do I do this with minimal change to the code above ?
You are generating multiple key-value pairs with the same key, and a dictionary will only ever store unique keys.
If you wanted just one key, you'd use a dictionary with a list comprehension:
container = {'a': [s['a'] for s in sample_dict if 'a' in s]}
Note that there is no need to iterate over the nested dictionaries in sample_dict if all you wanted was a specific key; in the above I simply test if the key exists ('a' in s) and extract the value for that key with s['a']. This is much faster than looping over all the keys.
Another option:
filter = lambda arr, x: { x: [ e.get(x) for e in arr] }
So, from here, you can construct the dict based on the original array and the key
filter(sample_dict, 'a')
# {'a': ['woot', 'coot', 'doot', 'soot', 'toot']}

Categories