Sum across all list positions in a dictionary - python

This is a question and answer I wanted to share, since I found it very useful.
Suppose I have a dictionary accessible with different keys. And at each position of the dictionary I have a list of a fixed length:
a={}
a["hello"]=[2,3,4]
a["bye"]=[0,10,100]
a["goodbye"]=[2,5,50]
I was interested to compute the sum across all entries in a using only position 1 of their respective lists.
In the example, I wanted to sum:
finalsum=sum([3,10,5]) #-----> 18

Just skip the keys entirely, since they don't really matter.
sum(i[1] for i in a.itervalues())
Also as a side note, you don't need to do a.keys() when iterating over a dict, you can just say for key in a and it will use the keys.
You can use a.values() to get a list of all the values in a dict. As far as I can tell, the keys are irrelevant. a.itervalues() works by iterating rather than constructing a new list. By using this, and a generator expression as the argument to sum, there are no extraneous lists created.

I used list-comprehensions for my one line solution(here separated in two lines):
elements=[a[pos][1] for pos in a.keys()] #----> [3,5,10]
finalsum=sum(elements)
I'm happy with this solution :) , but, any other suggestions?

Related

Unpacking a list of dictionaries to get all their keys

I am writing a script to add missing keys within a list of dictionaries and assign them a default value. I start by building a set of all the possible keys that appear in one or more dictionaries.
I adapted a nice snippet of code for this but I'm having trouble fully wrapping my head around how it works:
all_keys = set().union(*dicts)
From how I understand this, my list of dictionaries dicts is unpacked into individual (dictionary) arguments for the union method, which merges them all together with the empty set, giving me a set of keys.
What isn't clear to me is why this code builds the set using just the keys of the dictionaries, while discarding their associated values. This is in fact what I want to happen, but how it is achieved here is murky. For example, I know unpacking a dictionary with a single * unpacks just the keys, which seems like what is happening here, except in my code I am not explicitly unpacking the contents of the dictionaries, only the list that contains them.
Can someone explain to me a little more explicitly what is happening under the hood here?
If you wrote:
s1 = set()
s2 = s1.union(iterable1, iterable2, iterable3)
the union() method would unpack each iterableX to get the values to combine with s1.
Your code is simply getting all the iterables by spreading dicts, so it's equivalent to
s2 = s1.union(dicts[0], dicts[1], dicts[2], ...)
and it unpacks each dictionary, getting their keys.

Inserting a dictionary to a sorted list of dictionaries

I've been trying to insert a new dictionary to a sorted list of dictionaries while maintaining the order.
To sort the list of dictionaries I've used sample_dict = sorted(sample_dict, key=lambda k: k['ID'])
It seems the only solution would be to iterate over the list and compare the ID of each entry with the previous one but this solution does not sound optimal (time-wise).
I have also found bisect library which allows entry insertion in lists while keeping the correct order but it seems it does not work with dictionaries ( throws TypeError: '<' not supported between instances of 'dict' and 'dict' ). I also want to mention that my entries contains a lot key-values pairs (21) and I am not sure if there are any alternatives to dictionaries (e.g tuples). Lastly, I want to mention that "ID" is a string.
Is there something I am missing or is iterating over the whole list for each insertion the only solution?
Thanks in advance
Unfortunately bisect doesn't allow you to give a key parameter like sorted does. But it's easy to get around it by keeping a list of tuples instead of dictionaries. Less-than on a tuple compares element by element, so if the first element of the tuple is your key then everything works.
sample_dict = sorted(((k['ID'], k) for k in sample_dict))
As mentioned in the comments, this still fails if two list items have the same ID because the comparison moves on to the second tuple element. The solution is to add another tuple element that is guaranteed to never be equal.
sample_dict = sorted(((k['ID'], index, k) for index,k in enumerate(sample_dict)))

Triple list VS double dictionary

I have 40.000 documents, 93.08 words per doc. on avg., where every word is a number (which can index a dictionary) and every word has a count (frequency). Read more here.
I am between two data structures to store the data and was wondering which one I should choose, which one the Python people would choose!
Triple-list:
A list, where every node:
__ is a list, where every node:
__.... is a list of two values; word_id and count.
Double-dictionary:
A dictionary, with keys the doc_id and values dictionaries.
That value dictionary would have a word_id as a key and the count as a value.
I feel that the first will require less space (since it doesn't store the doc_id), while the second will be more easy to handle and access. I mean, accessing the i-element in the list is O(n), while it is constant in the dictionary, I think. Which one should I choose?
You should use a dictionary. It will make handling your code easier to understand and to program and it will have a lower complexity as well.
The only reason you would use a list, is if you cared about the order of the documents.
If you don't care about the order of the items you should definitely use a dictionary because dictionaries are used to group associated data while lists are generally used to group more generic items.
Moreover lookups in dictionaries are faster than that of a list.
Lookups in lists are O(n) while lookups in dictionaries are O(1). though lists are considerably larger in Memory than lists
Essentially you just want to store a large amount of numbers, for which the most space efficient choice is an array. These are one-dimensional so you could write a class which takes in three indices (the last being 0 for word_id and 1 for count) and does some basic addition and multiplication to find the correct 1D index.

How do I sort a list in a dictionary

How can I sort the list present in the dictionary:
fourB={"James":[10,11,9]}
I will have multiple entries but I want to be able to sort the list of integers out for each one of them. How can I do that? Thanks!Any help will be appreciated. :)
for numbers in fourB.values():
numbers.sort()
The above is better than iterating over the keys() followed by indexing into fourB, because here you avoid the dict lookup.
If you love one-liners, here's one:
map(list.sort, fourB.values())
But take note if there are many keys in the dict, as this will return a list of [None]*len(fourB.values()) which is immediately discarded--and that's not optimally efficient. I'd stick with the obvious loop version for this reason and also for readability.
You should iterate over the keys and sort their values:
for k in fourB.keys():
fourB[k].sort()

Dictionary into dictionary in python

Ok, this one should be simple. I have 3 dictionaries. They are all made, ordered, and filled to my satisfaction but I would like to put them all in an overarching dictionary so I can reference and manipulate them more easily and efficiently.
Layer0 = {}
Layer1 = {}
Layer2 = {}
here they are when created, and afterwards I feebly tried different things based on SO questions:
Layers = {Layer0, Layer1, Layer2}
which raised a syntax error
Layers = {'Layer0', 'Layer1', 'Layer2'}
which raised another syntax error
(Layers is the Dictionary I'm trying to create that will have all the previously made dictionaries within it)
All the other examples I found on SO have been related to creating dictionaries within dictionaries in order to fill them (or filling them simultaneously) and since I already coded a large number of lines to make these dictionaries, I'd rather put them into a dictionary after the fact instead of re-writing code.
It would be best if the order of the dictionaries are preserved when put into Layers
Does anyone know if this is possible and how I should do it?
Dictionary items have both a key and a value.
Layers = {'Layer0': Layer0, 'Layer1': Layer1, 'Layer2': Layer2}
Keep in mind that dictionaries don't have an order, since a dictionary is a hash table (i.e. a mapping from your key names to a unique hash value). Using .keys() or .values() generates a list, which does have an order, but the dictionary itself doesn't.
So when you say "It would be best if the order of the dictionaries are preserved when put into Layers" - this doesn't really mean anything. For example, if you rename your dictionaries from "Layer1, Layer2, Layer3" to "A, B, C," you'll see that Layers.keys() prints in the order "A, C, B." This is true regardless of the order you used when building the dictionary. All this shows is that the hash value of "C" is less than that of "B," and it doesn't tell you anything about the structure of your dictionary.
This is also why you can't directly iterate over a dictionary (you have to iterate over e.g. a list of the keys).
As a side note, this hash function is what allows a dictionary to do crazy fast lookups. A good hash function will give you constant time [O(1)] lookup, meaning you can check if a given item is in your dictionary in the same amount of time whether the dictionary contains ten items or ten million. Pretty cool.

Categories