Related
Say I have two dictionaries:
dict_one = {abc: 5, bat: 1, car: 6, xray: 3}
dict_two = {abc: 2, jfk: 4, zit: 7}
I want to compare the keys of both and create a new dictionary which contains only the keys of dict_one which don't occur in dict_two.
The new dictionary would look like this:
unique_dict = {bat: 1, car: 6, xray: 3}
This is what I am trying at the moment (the first part found here: Python read two dictionaries compare values and create third dictionary)
However I know the problem is that I can't update(key, value) as it takes only one argument, but I just don't know how to do this correctly.
d1_values = set(dict_one.keys())
d2_values = set(dict_two.keys())
words_in_both = d1_values & d2_values
not_in_both = d1_values ^ d2_values
unique_dict = {}
for key, value in dict_one.items():
for word in words_in_both:
if key != word:
unique_dict.update(key, value) # this is not correct
You could use the following dictionary comprehension to keep the key/value pairs in dict_one if they are not in dict_two:
{k:v for k,v in dict_one.items() if k not in dict_two}
# {'bat': 1, 'car': 6, 'xray': 3}
Sets and dict views already support subtraction to keep only the values in the left hand side not found in the right. So your code could simplify to:
{k: dict_one[k] for k in dict_one.keys() - dict_two.keys()}
yatu's answer is probably better in this specific case (no set temporary required), but I figured I'd point out set/view subtraction as well as showing no conversion to set itself is necessary (views can be used as set-like objects already).
def filter_dicts(d1, d2):
"""Return d1 items that are not in d2"""
return {title: value for title, value in d1.items() if title not in d2}
I have data (counts) indexed by user_id and analysis_type_id obtained from a database. It's a list of 3-tuple. Sample data:
counts = [(4, 1, 4), (3, 5, 4), (2, 10, 4), (2, 10, 5)]
where the first item of each tuple is the count, the second the analysis_type_id, and the last the user_id.
I'd like to place that into a dictionary, so i can retrieve the counts quickly: given a user_id and analysis_type_id. It would have to be a two-level dictionary. Is there any better structure?
To construct the two-level dictionary "by hand", I would code:
dict = {4:{1:4,5:3,10:2},5:{10:2}}
Where user_id is the first dict key level, analysis_type_id is the second (sub-) key, and the count is the value inside the dict.
How would I create the "double-depth" in dict keys through list comprehension?
Or do I need to resort to a nested for-loop, where I first iterate through unique user_id values, then find matching analysis_type_id and fill in the counts ... one-at-a-time into the dict?
Two Tuple Keys
I would suggest abandoning the idea of nesting dictionaries and simply use two tuples as the keys directly. Like so:
d = { (user_id, analysis_type_id): count for count, analysis_type_id, user_id in counts}
The dictionary is a hash table. In python, each two tuple has a single hash value (not two hash values) and thus each two tuple is looked up based on its (relatively) unique hash. Therefore this is faster (2x faster, most of the time) than looking up the hash of TWO separate keys (first the user_id, then the analysis_type_id).
However, beware of premature optimization. Unless you're doing millions of lookups, the increase in performance of the flat dict is unlikely to matter. The real reason to favor the use of the two tuple here is that the syntax and readability of a two tuple solution is far superior than other solutions- that is, assuming the vast majority of the time you will be wanting to access items based on a pair of values and not groups of items based on a single value.
Consider Using a namedtuple
It may be convenient to create a named tuple for storing those keys. Do that this way:
from collections import namedtuple
IdPair = namedtuple("IdPair", "user_id, analysis_type_id")
Then use it in your dictionary comprehension:
d = { IdPair(user_id, analysis_type_id): count for count, analysis_type_id, user_id in counts}
And access a count you're interested in like this:
somepair = IdPair(user_id = 4, analysis_type_id = 1)
d[somepair]
The reason this is sometimes useful is you can do things like this:
user_id = somepair.user_id # very nice syntax
Some Other Useful Options
One downside of the above solution is the case in which your lookup fails. In that case, you will only get a traceback like the following:
>>> d[IdPair(0,0)]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: IdPair(user_id=0, analysis_type_id=0)
This isn't very helpful; was it the user_id that was unmatched, or the analysis_type_id, or both?
You can create a better tool for yourself by creating your own dict type that gives you a nice traceback with more information. It might look something like this:
class CountsDict(dict):
"""A dict for storing IdPair keys and count values as integers.
Provides more detailed traceback information than a regular dict.
"""
def __getitem__(self, k):
try:
return super().__getitem__(k)
except KeyError as exc:
raise self._handle_bad_key(k, exc) from exc
def _handle_bad_key(self, k, exc):
"""Provides a custom exception when a bad key is given."""
try:
user_id, analysis_type_id = k
except:
return exc
has_u_id = next((True for u_id, _ in self if u_id==user_id), False)
has_at_id = next((True for _, at_id in self if at_id==analysis_type_id), False)
exc_lookup = {(False, False):KeyError(f"CountsDict missing pair: {k}"),
(True, False):KeyError(f"CountsDict missing analysis_type_id: "
f"{analysis_type_id}"),
(False, True):KeyError(f"CountsDict missing user_id: {user_id}")}
return exc_lookup[(user_id, analysis_type_id)]
Use it just like a regular dict.
However, it may make MORE sense to simply add new pairs to your dict (with a count of zero) when you try to access a missing pair. If this is the case, I'd use a defaultdict and have it set the count to zero (using the default value of int as the factory function) when a missing key is accessed. Like so:
from collections import defaultdict
my_dict = defaultdict(default_factory=int,
((user_id, analysis_type_id), count) for count, analysis_type_id, user_id in counts))
Now if you attempt to access a key that is missing, the count will be set to zero. However, one problem with this method is that ALL keys will be set to zero:
value = my_dict['I'm not a two tuple, sucka!!!!'] # <-- will be added to my_dict
To prevent this, we go back to the idea of making a CountsDict, except in this case, your special dict will be a subclass of defaultdict. However, unlike a regular defaultdict, it will check to make sure the key is a valid kind before it is added. And as a bonus, we can make sure ANY two tuple that is added as a key becomes an IdPair.
from collections import defaultdict
class CountsDict(defaultdict):
"""A dict for storing IdPair keys and count values as integers.
Missing two-tuple keys are converted to an IdPair. Invalid keys raise a KeyError.
"""
def __getitem__(self, k):
try:
user_id, analysis_type_id = k
except:
raise KeyError(f"The provided key {k!r} is not a valid key.")
else:
# convert two tuple to an IdPair if it was not already
k = IdPair(user_id, analysis_type_id)
return super().__getitem__(k)
Use it just like the regular defaultdict:
my_dict = CountsDict(default_factory=int,
((user_id, analysis_type_id), count) for count, analysis_type_id, user_id in counts))
NOTE: In the above I have not made it so that two tuple keys are converted to IdPairs upon instance creation (because __setitem__ is not utilized during instance creation). To create this functionality, we would also need to implement an override of the __init__ method.
Wrap Up
Out of all of these, the more useful option depends entirely on your use case.
The most readable solution utilizes a defaultdict which saves you nested loops and bumpy checking if keys already exist:
from collections import defaultdict
dct = defaultdict(dict) # do not shadow the built-in 'dict'
for x, y, z in counts:
dct[z][y] = x
dct
# defaultdict(dict, {4: {1: 4, 5: 3, 10: 2}, 5: {10: 2}})
If you really want a one-liner comprehension you can use itertools.groupby and this clunkiness:
from itertools import groupby
dct = {k: {y: x for x, y, _ in g} for k, g in groupby(sorted(counts, key=lambda c: c[2]), key=lambda c: c[2])}
If your initial data is already sorted by user_id, you can save yourself the sorting.
This is a good use for the defaultdict object. You can create a defaultdict whose elements are always dicts. Then you can just stuff the counts into the right dicts, like this:
from collections import defaultdict
counts = [(4, 1, 4), (3, 5, 4), (2, 10, 4), (2, 10, 5)]
dct = defaultdict(dict)
for count, analysis_type_id, user_id in counts:
dct[user_id][analysis_type_id]=count
dct
# defaultdict(dict, {4: {1: 4, 5: 3, 10: 2}, 5: {10: 2}})
# if you want a 'normal' dict, you can finish with this:
dct = dict(dct)
Or you can just use standard dicts with setdefault:
counts = [(4, 1, 4), (3, 5, 4), (2, 10, 4), (2, 10, 5)]
dct = dict()
for count, analysis_type_id, user_id in counts:
dct.setdefault(user_id, dict())
dct[user_id][analysis_type_id]=count
dct
# {4: {1: 4, 5: 3, 10: 2}, 5: {10: 2}}
I don't think you can do this neatly with a list comprehension, but there's no need to be afraid of a for-loop for this kind of thing.
you could use the following logic. It's no need to import any package, just we should use for loops properly.
counts = [(4, 1, 4), (3, 5, 4), (2, 10, 4), (2, 10, 5)]
dct = {x[2]:{y[1]:y[0] for y in counts if x[2] == y[2]} for x in counts }
"""output will be {4: {1: 4, 5: 3, 10: 2}, 5: {10: 2}} """
You can list comprehension for nested loops with condition and use one or more of them for elements selections:
# create dict with tuples
line_dict = {str(nest_list[0]) : nest_list[1:] for nest_list in nest_lists for elem in nest_list if elem== nest_list[0]}
print(line_dict)
# create dict with list
line_dict1 = {str(nest_list[0]) list(nest_list[1:]) for nest_list in nest_lists for elem in nest_list if elem== nest_list[0]}
print(line_dict1)
Example: nest_lists = [("a","aa","aaa","aaaa"), ("b","bb","bbb","bbbb") ("c","cc","ccc","cccc"), ("d","dd","ddd","dddd")]
Output: {'a': ('aa', 'aaa', 'aaaa'), 'b': ('bb', 'bbb', 'bbbb'), 'c': ('cc', 'ccc', 'cccc'), 'd': ('dd', 'ddd', 'dddd')}, {'a': ['aa', 'aaa', 'aaaa'], 'b': ['bb', 'bbb', 'bbbb'], 'c': ['cc', 'ccc', 'cccc'], 'd': ['dd', 'ddd', 'dddd']}
I'm taking two lists of tuples, doing some calculations and creating a dictionary, or at least trying to, so that I can find the character with the largest percentage.
dict_original = dict(original_list)
for c, p in new_list:
if c in dict_original and dict_original[c] < p:
diff = p - dict_original[c]
output = {c:round(diff,3)}
print output
The output i'm getting is something like this:
{'o': 0.026}
{'x': 0.046}
{'t': 0.037}
{'/': 0.038}
{'p': 0.037}
{'s': 0.038}
All I want is the character with the largest percentage; 'x', in this case. I've been unsuccessful using max so far.
I know, my output seems to be a bunch of dictionaries, that's why i'm asking for some help here.
Thanks!
Without seeing your data it is hard to give you proper guidance.
You are not creating a new dictionary of all the results just a dict per result which is being discarded. You can create a complete dictionary using a dict comprehension, e.g. this is equivalent to your for loop:
do = dict_original
output = {c: round(p-do[c], 3) for c, p in new_list if do.get(c, float('inf')) < p}
To get the maximum value from this dict then as pointed out by #batman:
max(output, key=output.get)
Would return 'x'
You can pass max a key argument:
bar = max(foo, key=foo.get)
That will give you the key with the largest value.
I am trying to create a simple dictionary of each letter with a number afterward (from 1-26), like this: {'a': 1, 'b': 2, 'c': 3, ...}.
I wanted to try using a dictionary comprehension to do this, so I did:
from string import lowercase
d = {s:i for s in lowercase for i in range(1, 27)}
However, this results in: {'a': 26, 'b': 26, 'c': 26, ...}. I think this happens because it's iterating over every value in lowercase, assigning it to 1, then 2, then 3 (for every value) ending at 26. There are only 26 keys because since it's a dictionary, it won't have two keys of the same letter (so it overwrites all of them to 26 at the end). I am not sure how to fix this, so if I could get guidance on how to actually do this, that would be great.
I got it to work using dict() and zip(): dict(zip(lowercase, range(1, 27))). However, I want to know how to do this using a dictionary comprehension. Thanks!
With enumerate:
{s: i for i, s in enumerate(lowercase, 1)}
Consider the following dictionary, d:
d = {'a': 3, 'b': 2, 'c': 3, 'd': 4, 'e': 5}
I want to return the first N key:value pairs from d (N <= 4 in this case). What is the most efficient method of doing this?
There's no such thing a the "first n" keys because a dict doesn't remember which keys were inserted first.
You can get any n key-value pairs though:
n_items = take(n, d.items())
This uses the implementation of take from the itertools recipes:
from itertools import islice
def take(n, iterable):
"""Return the first n items of the iterable as a list."""
return list(islice(iterable, n))
See it working online: ideone
For Python < 3.6
n_items = take(n, d.iteritems())
A very efficient way to retrieve anything is to combine list or dictionary comprehensions with slicing. If you don't need to order the items (you just want n random pairs), you can use a dictionary comprehension like this:
# Python 2
first2pairs = {k: mydict[k] for k in mydict.keys()[:2]}
# Python 3
first2pairs = {k: mydict[k] for k in list(mydict)[:2]}
Generally a comprehension like this is always faster to run than the equivalent "for x in y" loop. Also, by using .keys() to make a list of the dictionary keys and slicing that list you avoid 'touching' any unnecessary keys when you build the new dictionary.
If you don't need the keys (only the values) you can use a list comprehension:
first2vals = [v for v in mydict.values()[:2]]
If you need the values sorted based on their keys, it's not much more trouble:
first2vals = [mydict[k] for k in sorted(mydict.keys())[:2]]
or if you need the keys as well:
first2pairs = {k: mydict[k] for k in sorted(mydict.keys())[:2]}
To get the top N elements from your python dictionary one can use the following line of code:
list(dictionaryName.items())[:N]
In your case you can change it to:
list(d.items())[:4]
Python's dicts are not ordered, so it's meaningless to ask for the "first N" keys.
The collections.OrderedDict class is available if that's what you need. You could efficiently get its first four elements as
import itertools
import collections
d = collections.OrderedDict((('foo', 'bar'), (1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')))
x = itertools.islice(d.items(), 0, 4)
for key, value in x:
print key, value
itertools.islice allows you to lazily take a slice of elements from any iterator. If you want the result to be reusable you'd need to convert it to a list or something, like so:
x = list(itertools.islice(d.items(), 0, 4))
foo = {'a':1, 'b':2, 'c':3, 'd':4, 'e':5, 'f':6}
iterator = iter(foo.items())
for i in range(3):
print(next(iterator))
Basically, turn the view (dict_items) into an iterator, and then iterate it with next().
in py3, this will do the trick
{A:N for (A,N) in [x for x in d.items()][:4]}
{'a': 3, 'b': 2, 'c': 3, 'd': 4}
You can get dictionary items by calling .items() on the dictionary. then convert that to a list and from there get first N items as you would on any list.
below code prints first 3 items of the dictionary object
e.g.
d = {'a': 3, 'b': 2, 'c': 3, 'd': 4, 'e': 5}
first_three_items = list(d.items())[:3]
print(first_three_items)
Outputs:
[('a', 3), ('b', 2), ('c', 3)]
For Python 3.8 the correct answer should be:
import more_itertools
d = {'a': 3, 'b': 2, 'c': 3, 'd': 4, 'e': 5}
first_n = more_itertools.take(3, d.items())
print(len(first_n))
print(first_n)
Whose output is:
3
[('a', 3), ('b', 2), ('c', 3)]
After pip install more-itertools of course.
Did not see it on here. Will not be ordered but the simplest syntactically if you need to just take some elements from a dictionary.
n = 2
{key:value for key,value in d.items()[0:n]}
Were d is your dictionary and n is the printing number:
for idx, (k, v) in enumerate(d.items()):
if idx == n: break
print(k, v)
Casting your dictionary to a list can be slow.
Your dictionary may be too large and you don't need to cast all of it just for printing a few of the first.
See PEP 0265 on sorting dictionaries. Then use the aforementioned iterable code.
If you need more efficiency in the sorted key-value pairs. Use a different data structure. That is, one that maintains sorted order and the key-value associations.
E.g.
import bisect
kvlist = [('a', 1), ('b', 2), ('c', 3), ('e', 5)]
bisect.insort_left(kvlist, ('d', 4))
print kvlist # [('a', 1), ('b', 2), ('c', 3), ('d', 4), ('e', 5)]
just add an answer using zip,
{k: d[k] for k, _ in zip(d, range(n))}
This will work for python 3.8+:
d_new = {k:v for i, (k, v) in enumerate(d.items()) if i < n}
This depends on what is 'most efficient' in your case.
If you just want a semi-random sample of a huge dictionary foo, use foo.iteritems() and take as many values from it as you need, it's a lazy operation that avoids creation of an explicit list of keys or items.
If you need to sort keys first, there's no way around using something like keys = foo.keys(); keys.sort() or sorted(foo.iterkeys()), you'll have to build an explicit list of keys. Then slice or iterate through first N keys.
BTW why do you care about the 'efficient' way? Did you profile your program? If you did not, use the obvious and easy to understand way first. Chances are it will do pretty well without becoming a bottleneck.
For Python 3 and above,To select first n Pairs
n=4
firstNpairs = {k: Diction[k] for k in list(Diction.keys())[:n]}
This might not be very elegant, but works for me:
d = {'a': 3, 'b': 2, 'c': 3, 'd': 4, 'e': 5}
x= 0
for key, val in d.items():
if x == 2:
break
else:
x += 1
# Do something with the first two key-value pairs
You can approach this a number of ways. If order is important you can do this:
for key in sorted(d.keys()):
item = d.pop(key)
If order isn't a concern you can do this:
for i in range(4):
item = d.popitem()
Dictionary maintains no order , so before picking top N key value pairs lets make it sorted.
import operator
d = {'a': 3, 'b': 2, 'c': 3, 'd': 4}
d=dict(sorted(d.items(),key=operator.itemgetter(1),reverse=True))
#itemgetter(0)=sort by keys, itemgetter(1)=sort by values
Now we can do the retrieval of top 'N' elements:, using the method structure like this:
def return_top(elements,dictionary_element):
'''Takes the dictionary and the 'N' elements needed in return
'''
topers={}
for h,i in enumerate(dictionary_element):
if h<elements:
topers.update({i:dictionary_element[i]})
return topers
to get the top 2 elements then simply use this structure:
d = {'a': 3, 'b': 2, 'c': 3, 'd': 4}
d=dict(sorted(d.items(),key=operator.itemgetter(1),reverse=True))
d=return_top(2,d)
print(d)
consider a dict
d = {'a': 3, 'b': 2, 'c': 3, 'd': 4, 'e': 5}
from itertools import islice
n = 3
list(islice(d.items(),n))
islice will do the trick :)
hope it helps !
I have tried a few of the answers above and note that some of them are version dependent and do not work in version 3.7.
I also note that since 3.6 all dictionaries are ordered by the sequence in which items are inserted.
Despite dictionaries being ordered since 3.6 some of the statements you expect to work with ordered structures don't seem to work.
The answer to the OP question that worked best for me.
itr = iter(dic.items())
lst = [next(itr) for i in range(3)]
def GetNFirstItems(self):
self.dict = {f'Item{i + 1}': round(uniform(20.40, 50.50), 2) for i in range(10)}#Example Dict
self.get_items = int(input())
for self.index,self.item in zip(range(len(self.dict)),self.dict.items()):
if self.index==self.get_items:
break
else:
print(self.item,",",end="")
Unusual approach, as it gives out intense O(N) time complexity.
I like this one because no new list needs to be created, its a one liner which does exactly what you want and it works with python >= 3.8 (where dictionaries are indeed ordered, I think from python 3.6 on?):
new_d = {kv[0]:kv[1] for i, kv in enumerate(d.items()) if i <= 4}