I am writing a script to add missing keys within a list of dictionaries and assign them a default value. I start by building a set of all the possible keys that appear in one or more dictionaries.
I adapted a nice snippet of code for this but I'm having trouble fully wrapping my head around how it works:
all_keys = set().union(*dicts)
From how I understand this, my list of dictionaries dicts is unpacked into individual (dictionary) arguments for the union method, which merges them all together with the empty set, giving me a set of keys.
What isn't clear to me is why this code builds the set using just the keys of the dictionaries, while discarding their associated values. This is in fact what I want to happen, but how it is achieved here is murky. For example, I know unpacking a dictionary with a single * unpacks just the keys, which seems like what is happening here, except in my code I am not explicitly unpacking the contents of the dictionaries, only the list that contains them.
Can someone explain to me a little more explicitly what is happening under the hood here?
If you wrote:
s1 = set()
s2 = s1.union(iterable1, iterable2, iterable3)
the union() method would unpack each iterableX to get the values to combine with s1.
Your code is simply getting all the iterables by spreading dicts, so it's equivalent to
s2 = s1.union(dicts[0], dicts[1], dicts[2], ...)
and it unpacks each dictionary, getting their keys.
Related
I've been trying to insert a new dictionary to a sorted list of dictionaries while maintaining the order.
To sort the list of dictionaries I've used sample_dict = sorted(sample_dict, key=lambda k: k['ID'])
It seems the only solution would be to iterate over the list and compare the ID of each entry with the previous one but this solution does not sound optimal (time-wise).
I have also found bisect library which allows entry insertion in lists while keeping the correct order but it seems it does not work with dictionaries ( throws TypeError: '<' not supported between instances of 'dict' and 'dict' ). I also want to mention that my entries contains a lot key-values pairs (21) and I am not sure if there are any alternatives to dictionaries (e.g tuples). Lastly, I want to mention that "ID" is a string.
Is there something I am missing or is iterating over the whole list for each insertion the only solution?
Thanks in advance
Unfortunately bisect doesn't allow you to give a key parameter like sorted does. But it's easy to get around it by keeping a list of tuples instead of dictionaries. Less-than on a tuple compares element by element, so if the first element of the tuple is your key then everything works.
sample_dict = sorted(((k['ID'], k) for k in sample_dict))
As mentioned in the comments, this still fails if two list items have the same ID because the comparison moves on to the second tuple element. The solution is to add another tuple element that is guaranteed to never be equal.
sample_dict = sorted(((k['ID'], index, k) for index,k in enumerate(sample_dict)))
I have values in a list of lists.
I would like to send the whole block to a conversion function which then returns all the converted values in the same structure.
my_list = [sensor1...sensor4] = [hum1...hum3] = [value1, value2, value3, value4]
So several nested lists
def conversion(my_list): dictionaries
for sensor in my_list:
for hum in sensor:
for value in hum:
map(function, value)
Is there a way to do a list comprehension as a one liner? I'm not sure how to use the map function in comprehensions especially when you have several nested iterations.
map(function, value)
Since you are just mapping a function on each value, without collecting the return value in a list, using a list comprehension is not a good idea. You could do it, but you would be collecting list items that have no value, for the sole purpose of throwing them away later—just so you can save a few lines that actually serve a much better purpose: Clearly telling what’s going on, without being in a single, long, and complicated line.
So my advice would be to keep it as it is. It makes more sense like that and clearly shows what’s going on.
I am however collecting the values. They all need to be converted and saved in the same structure as they were.
In that case, you still don’t want a list comprehension as that would mean that you created a new list (for no real reason). Instead, just update the most-inner list. To do that, you need to change the way you’re iterating though:
for sensor in my_list:
for hum in sensor:
for i, value in enumerate(hum):
hum[i] = map(function, value)
This will update the inner list.
Alternatively, since value is actually a list of values, you can also replace the value list’s contents using the slicing syntax:
for sensor in my_list:
for hum in sensor:
for value in hum:
value[:] = map(function, value)
Also one final note: If you are using Python 3, remember that map returns a generator, so you need to convert it to a list first using list(map(function, value)); or use a list comprehension for that part with [function(v) for v in value].
This is the right way to do it. You can use list comprehension to do that, but you shouldn't for code readability and because it's probably not faster.
Consider the below situation. I have a list:
feature_dict = vectorizer.get_feature_names()
Which just have some strings, all of which are a kind of internal identifiers, completely meaningless. I also have a dictionary (it is filled in different part of code):
phoneDict = dict()
This dictionary has mentioned identifiers as keys, and values assigned to them are, well, good values which mean something.
I want to create a new list preserving the order of original list (this is crucial) but replacing each element with the value from dictionary. So I thought about creating new list by applying a function to each element of list but with no luck.
I tried to create a fuction:
def fastMap(x):
return phoneDict[x]
And then map it:
map(fastMap, feature_dict)
It just returns me
map object at 0x0000000017DFBD30.
Nothing else
Anyone tried to solve similar problem?
Just convert the result to list:
list(map(fastMap, feature_dict))
Why? map() returns an iterator, see https://docs.python.org/3/library/functions.html#map:
map(function, iterable, ...)
Return an iterator that applies function
to every item of iterable, yielding the results. If additional
iterable arguments are passed, function must take that many arguments
and is applied to the items from all iterables in parallel. With
multiple iterables, the iterator stops when the shortest iterable is
exhausted. For cases where the function inputs are already arranged
into argument tuples, see itertools.starmap().
which you can convert to a list with list()
Note: in python 2, map() returns a list, but this was changed in python 3 to return an iterator
This is a question and answer I wanted to share, since I found it very useful.
Suppose I have a dictionary accessible with different keys. And at each position of the dictionary I have a list of a fixed length:
a={}
a["hello"]=[2,3,4]
a["bye"]=[0,10,100]
a["goodbye"]=[2,5,50]
I was interested to compute the sum across all entries in a using only position 1 of their respective lists.
In the example, I wanted to sum:
finalsum=sum([3,10,5]) #-----> 18
Just skip the keys entirely, since they don't really matter.
sum(i[1] for i in a.itervalues())
Also as a side note, you don't need to do a.keys() when iterating over a dict, you can just say for key in a and it will use the keys.
You can use a.values() to get a list of all the values in a dict. As far as I can tell, the keys are irrelevant. a.itervalues() works by iterating rather than constructing a new list. By using this, and a generator expression as the argument to sum, there are no extraneous lists created.
I used list-comprehensions for my one line solution(here separated in two lines):
elements=[a[pos][1] for pos in a.keys()] #----> [3,5,10]
finalsum=sum(elements)
I'm happy with this solution :) , but, any other suggestions?
Ok, this one should be simple. I have 3 dictionaries. They are all made, ordered, and filled to my satisfaction but I would like to put them all in an overarching dictionary so I can reference and manipulate them more easily and efficiently.
Layer0 = {}
Layer1 = {}
Layer2 = {}
here they are when created, and afterwards I feebly tried different things based on SO questions:
Layers = {Layer0, Layer1, Layer2}
which raised a syntax error
Layers = {'Layer0', 'Layer1', 'Layer2'}
which raised another syntax error
(Layers is the Dictionary I'm trying to create that will have all the previously made dictionaries within it)
All the other examples I found on SO have been related to creating dictionaries within dictionaries in order to fill them (or filling them simultaneously) and since I already coded a large number of lines to make these dictionaries, I'd rather put them into a dictionary after the fact instead of re-writing code.
It would be best if the order of the dictionaries are preserved when put into Layers
Does anyone know if this is possible and how I should do it?
Dictionary items have both a key and a value.
Layers = {'Layer0': Layer0, 'Layer1': Layer1, 'Layer2': Layer2}
Keep in mind that dictionaries don't have an order, since a dictionary is a hash table (i.e. a mapping from your key names to a unique hash value). Using .keys() or .values() generates a list, which does have an order, but the dictionary itself doesn't.
So when you say "It would be best if the order of the dictionaries are preserved when put into Layers" - this doesn't really mean anything. For example, if you rename your dictionaries from "Layer1, Layer2, Layer3" to "A, B, C," you'll see that Layers.keys() prints in the order "A, C, B." This is true regardless of the order you used when building the dictionary. All this shows is that the hash value of "C" is less than that of "B," and it doesn't tell you anything about the structure of your dictionary.
This is also why you can't directly iterate over a dictionary (you have to iterate over e.g. a list of the keys).
As a side note, this hash function is what allows a dictionary to do crazy fast lookups. A good hash function will give you constant time [O(1)] lookup, meaning you can check if a given item is in your dictionary in the same amount of time whether the dictionary contains ten items or ten million. Pretty cool.