Create multiple dictionaries in an efficient way - python

I have to create more than 10 dictionaries. Is there a more efficient way to create multiple dictionaries using Python's built-in libraries as described below:
dict1_1= {
"value":100,
"secondvalue":200,
"thirdvalue":300
}
dict1_2= {
"fixedvalue":290,
"changedvalue":180,
"novalue":0
}

The dict builtin will create a dictionary from keyword arguments:
>>> dict(a=1, b=2)
{'a': 1, 'b': 2}
but you can use integers as keyword arguments:
>>> dict(a=1, 2=2)
File "<stdin>", line 1
dict(a=1, 2=2)
^^
SyntaxError: expression cannot contain assignment, perhaps you meant "=="?
However, dict will also accept an iterable of key/value tuples, and in this case they keys may be integers
>>> dict([('a', 1), (2, 2)])
{'a': 1, 2: 2}
If your keys are the same for all dicts you can use zip:
>>> keys = ('a', 2)
>>> values = [(1, 2), (3, 4)]
>>> for vs in values:
... print(dict(zip(keys, vs)))
...
{'a': 1, 2: 2}
{'a': 3, 2: 4}
However, if your keys are not consistent, there's nothing wrong with using the literal {...} constructor. In fact, if it's efficiency that you want, the literal constructor may be the best choice.

you can use a simple function to create new dictionaries. Look at the code below:
func = lambda **kwargs: kwargs
my_dict = func(x="test", y=1, z=[1, 'test'])
Note that the keys of dictionary can only be string

Related

Is there any straightforward option of unpacking a dictionary?

If I do something like this
some_obj = {"a": 1, "b": 2, "c": 3}
first, *rest = some_obj
I'll get a list, but I want it in 2 dictionaries: first = {"a": 1} and rest = {"b": 2, "c": 3}. As I understand, I can make a function, but I wonder if I can make it in one line, like in javascript with spread operator.
I don't know if there is a reliable way to achieve this in one line, But here is one method.
First unpack the keys and values(.items()). Using some_obj only iterate through the keys.
>>> some_obj = {"a":1, "b":2, "c": 3}
>>> first, *rest = some_obj.items()
But this will return a tuple,
>>> first
('a', 1)
>>> rest
[('b', 2), ('c', 3)]
But you can again convert back to dict with just a dict call.
>>> dict([first])
{'a': 1}
>>> dict(rest)
{'b': 2, 'c': 3}
A oneliner inspired by Abdul Niyas P M's:
first, rest = dict([next(i := iter(some_obj.items()))]), dict(i)
Uses an assignment expression, introduced in Python 3.8 almost two years ago.
k = next(iter(some_obj)) # get the first key
first = {k: some_obj.pop(k)}
rest = some_obj
If you need to keep the original object intact - note this degrades from O(1) to O(n) in both time and space:
k = next(iter(some_obj))
rest = some_obj.copy()
first = {k: rest.pop(k)}
#AbdulNiyasPM's answer is perfectly fine, but since you asked for a one-liner, here's one way to do it (though you would have to do from operator import itemgetter first):
first, rest = map(dict, itemgetter(slice(0, 1), slice(1, None))(list(some_obj.items())))
If you prefer not to import anything, you can use a similar one-liner with a lambda function that takes one fixed argument and the rest as variable-length arguments:
first, rest = map(dict, (lambda f, *r: ((f,), r))(*some_obj.items()))
Demo: https://replit.com/#blhsing/ImpeccableEllipticalGeeklog

Make a Python function that returns the same arguments as it receives

What is a proper way in Python to write a function that will return the very same parameters it received at run-time?
E.g.:
def pass_thru(*args, **kwargs):
# do something non-destructive with *args & **kwargs
return ??? <- somehow return *args & **kwargs
Consider the following function:
def a(*args, **kwargs):
return args, kwargs
When we call the function, the value returned is a tuple, containing first another tuple with the arguments, then a dictionary with the keyword arguments:
b = a(1, 2, 3, a='foo')
print(b)
Outputs: ((1, 2, 3), {'a': 'foo'})
print(b[0]) # Gives the args as a tuple
print(b[1]) # Gives the kwargs as a dictionary
The problem is that your arguments are just a sequence of values, not a value itself you can manipulate. Keyword arguments are not themselves first-class values (that is, a=3 is not a value); they are purely a syntactic construct.
* and ** parameters get you halfway there:
def pass_thru(*args, **kwargs):
return *args, kwargs
Then
>>> pass_thru(1, 2, a=3)
(1, 2, {'a': 3})
but you can't simply pass that back to pass_thru; you'll get a different result.
>>> pass_thru(pass_thru(1,2,a=3))
((1, 2, {'a': 3}), {})
You can try unpacking the tuple:
>>> pass_thru(*pass_thru(1,2,a=3))
(1, 2, {'a': 3}, {})
but what you really need is to unpack the dict as well. Something like
>>> *a, kw = pass_thru(1,2,a=3)
>>> pass_thru(*a, **kw)
(1, 2, {'a': 3})
As far as I know, there is no way to combine the last example into a single, nested function call.

list comprehension to build a nested dictionary from a list of tuples

I have data (counts) indexed by user_id and analysis_type_id obtained from a database. It's a list of 3-tuple. Sample data:
counts = [(4, 1, 4), (3, 5, 4), (2, 10, 4), (2, 10, 5)]
where the first item of each tuple is the count, the second the analysis_type_id, and the last the user_id.
I'd like to place that into a dictionary, so i can retrieve the counts quickly: given a user_id and analysis_type_id. It would have to be a two-level dictionary. Is there any better structure?
To construct the two-level dictionary "by hand", I would code:
dict = {4:{1:4,5:3,10:2},5:{10:2}}
Where user_id is the first dict key level, analysis_type_id is the second (sub-) key, and the count is the value inside the dict.
How would I create the "double-depth" in dict keys through list comprehension?
Or do I need to resort to a nested for-loop, where I first iterate through unique user_id values, then find matching analysis_type_id and fill in the counts ... one-at-a-time into the dict?
Two Tuple Keys
I would suggest abandoning the idea of nesting dictionaries and simply use two tuples as the keys directly. Like so:
d = { (user_id, analysis_type_id): count for count, analysis_type_id, user_id in counts}
The dictionary is a hash table. In python, each two tuple has a single hash value (not two hash values) and thus each two tuple is looked up based on its (relatively) unique hash. Therefore this is faster (2x faster, most of the time) than looking up the hash of TWO separate keys (first the user_id, then the analysis_type_id).
However, beware of premature optimization. Unless you're doing millions of lookups, the increase in performance of the flat dict is unlikely to matter. The real reason to favor the use of the two tuple here is that the syntax and readability of a two tuple solution is far superior than other solutions- that is, assuming the vast majority of the time you will be wanting to access items based on a pair of values and not groups of items based on a single value.
Consider Using a namedtuple
It may be convenient to create a named tuple for storing those keys. Do that this way:
from collections import namedtuple
IdPair = namedtuple("IdPair", "user_id, analysis_type_id")
Then use it in your dictionary comprehension:
d = { IdPair(user_id, analysis_type_id): count for count, analysis_type_id, user_id in counts}
And access a count you're interested in like this:
somepair = IdPair(user_id = 4, analysis_type_id = 1)
d[somepair]
The reason this is sometimes useful is you can do things like this:
user_id = somepair.user_id # very nice syntax
Some Other Useful Options
One downside of the above solution is the case in which your lookup fails. In that case, you will only get a traceback like the following:
>>> d[IdPair(0,0)]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: IdPair(user_id=0, analysis_type_id=0)
This isn't very helpful; was it the user_id that was unmatched, or the analysis_type_id, or both?
You can create a better tool for yourself by creating your own dict type that gives you a nice traceback with more information. It might look something like this:
class CountsDict(dict):
"""A dict for storing IdPair keys and count values as integers.
Provides more detailed traceback information than a regular dict.
"""
def __getitem__(self, k):
try:
return super().__getitem__(k)
except KeyError as exc:
raise self._handle_bad_key(k, exc) from exc
def _handle_bad_key(self, k, exc):
"""Provides a custom exception when a bad key is given."""
try:
user_id, analysis_type_id = k
except:
return exc
has_u_id = next((True for u_id, _ in self if u_id==user_id), False)
has_at_id = next((True for _, at_id in self if at_id==analysis_type_id), False)
exc_lookup = {(False, False):KeyError(f"CountsDict missing pair: {k}"),
(True, False):KeyError(f"CountsDict missing analysis_type_id: "
f"{analysis_type_id}"),
(False, True):KeyError(f"CountsDict missing user_id: {user_id}")}
return exc_lookup[(user_id, analysis_type_id)]
Use it just like a regular dict.
However, it may make MORE sense to simply add new pairs to your dict (with a count of zero) when you try to access a missing pair. If this is the case, I'd use a defaultdict and have it set the count to zero (using the default value of int as the factory function) when a missing key is accessed. Like so:
from collections import defaultdict
my_dict = defaultdict(default_factory=int,
((user_id, analysis_type_id), count) for count, analysis_type_id, user_id in counts))
Now if you attempt to access a key that is missing, the count will be set to zero. However, one problem with this method is that ALL keys will be set to zero:
value = my_dict['I'm not a two tuple, sucka!!!!'] # <-- will be added to my_dict
To prevent this, we go back to the idea of making a CountsDict, except in this case, your special dict will be a subclass of defaultdict. However, unlike a regular defaultdict, it will check to make sure the key is a valid kind before it is added. And as a bonus, we can make sure ANY two tuple that is added as a key becomes an IdPair.
from collections import defaultdict
class CountsDict(defaultdict):
"""A dict for storing IdPair keys and count values as integers.
Missing two-tuple keys are converted to an IdPair. Invalid keys raise a KeyError.
"""
def __getitem__(self, k):
try:
user_id, analysis_type_id = k
except:
raise KeyError(f"The provided key {k!r} is not a valid key.")
else:
# convert two tuple to an IdPair if it was not already
k = IdPair(user_id, analysis_type_id)
return super().__getitem__(k)
Use it just like the regular defaultdict:
my_dict = CountsDict(default_factory=int,
((user_id, analysis_type_id), count) for count, analysis_type_id, user_id in counts))
NOTE: In the above I have not made it so that two tuple keys are converted to IdPairs upon instance creation (because __setitem__ is not utilized during instance creation). To create this functionality, we would also need to implement an override of the __init__ method.
Wrap Up
Out of all of these, the more useful option depends entirely on your use case.
The most readable solution utilizes a defaultdict which saves you nested loops and bumpy checking if keys already exist:
from collections import defaultdict
dct = defaultdict(dict) # do not shadow the built-in 'dict'
for x, y, z in counts:
dct[z][y] = x
dct
# defaultdict(dict, {4: {1: 4, 5: 3, 10: 2}, 5: {10: 2}})
If you really want a one-liner comprehension you can use itertools.groupby and this clunkiness:
from itertools import groupby
dct = {k: {y: x for x, y, _ in g} for k, g in groupby(sorted(counts, key=lambda c: c[2]), key=lambda c: c[2])}
If your initial data is already sorted by user_id, you can save yourself the sorting.
This is a good use for the defaultdict object. You can create a defaultdict whose elements are always dicts. Then you can just stuff the counts into the right dicts, like this:
from collections import defaultdict
counts = [(4, 1, 4), (3, 5, 4), (2, 10, 4), (2, 10, 5)]
dct = defaultdict(dict)
for count, analysis_type_id, user_id in counts:
dct[user_id][analysis_type_id]=count
dct
# defaultdict(dict, {4: {1: 4, 5: 3, 10: 2}, 5: {10: 2}})
# if you want a 'normal' dict, you can finish with this:
dct = dict(dct)
Or you can just use standard dicts with setdefault:
counts = [(4, 1, 4), (3, 5, 4), (2, 10, 4), (2, 10, 5)]
dct = dict()
for count, analysis_type_id, user_id in counts:
dct.setdefault(user_id, dict())
dct[user_id][analysis_type_id]=count
dct
# {4: {1: 4, 5: 3, 10: 2}, 5: {10: 2}}
I don't think you can do this neatly with a list comprehension, but there's no need to be afraid of a for-loop for this kind of thing.
you could use the following logic. It's no need to import any package, just we should use for loops properly.
counts = [(4, 1, 4), (3, 5, 4), (2, 10, 4), (2, 10, 5)]
dct = {x[2]:{y[1]:y[0] for y in counts if x[2] == y[2]} for x in counts }
"""output will be {4: {1: 4, 5: 3, 10: 2}, 5: {10: 2}} """
You can list comprehension for nested loops with condition and use one or more of them for elements selections:
# create dict with tuples
line_dict = {str(nest_list[0]) : nest_list[1:] for nest_list in nest_lists for elem in nest_list if elem== nest_list[0]}
print(line_dict)
# create dict with list
line_dict1 = {str(nest_list[0]) list(nest_list[1:]) for nest_list in nest_lists for elem in nest_list if elem== nest_list[0]}
print(line_dict1)
Example: nest_lists = [("a","aa","aaa","aaaa"), ("b","bb","bbb","bbbb") ("c","cc","ccc","cccc"), ("d","dd","ddd","dddd")]
Output: {'a': ('aa', 'aaa', 'aaaa'), 'b': ('bb', 'bbb', 'bbbb'), 'c': ('cc', 'ccc', 'cccc'), 'd': ('dd', 'ddd', 'dddd')}, {'a': ['aa', 'aaa', 'aaaa'], 'b': ['bb', 'bbb', 'bbbb'], 'c': ['cc', 'ccc', 'cccc'], 'd': ['dd', 'ddd', 'dddd']}

Passing String, integer and tuple information as key for python dictionary

I'm trying to create a python dictionary and I would like to use a key that contains strings, numerics & a list/tuple entry. The key should ideally look like
("stringA", "stringB", "stringC", integer1, (integer2, integer3, integer4))
I tried to create a namedtuple based on this documentation as follows
from collections import namedtuple
dictKey = namedtuple('dictKey', 'stringA stringB stringC integer1
(integer2 integer3 integer4)')
but it throws me a ValueError saying it can only contain alphanumeric characters and underscores. So
How can I create a dictionary key which contains a tuple?
How to effectively use the dictionary key (especially the tuple it
contains) to retrieve information from the dictionary?
The issue here is with your namedtuple definition, not the dictionary key structure itself, which will work just fine, e.g.:
>>> d = {}
>>> d[('1', '2', 3, (4, 5))] = 'foo'
>>> d
{('1', '2', 3, (4, 5)): 'foo'}
When the namedtuple reads the field_names parameter, it thinks you're trying to create a field named (integer2, and doesn't realise that you mean it to be a nested tuple.
To define that structure in a namedtuple, you will instead have to have an attribute that is itself a tuple:
>>> from collections import namedtuple
>>> dictKey = namedtuple("dictKey", "stringA stringB stringC integer1 tuple1")
>>> key = dictKey("foo", "bar", "baz", 1, (2, 3, 4))
>>> d[key] = 'bar'
>>> d
{dictKey(stringA='foo', stringB='bar', stringC='baz', integer1=1, tuple1=(2, 3, 4)): 'bar',
('1', '2', 3, (4, 5)): 'foo'}
You can retrieve the value stored against the key exactly as you can for any other, either with the original namedtuple:
>>> d[key]
'bar'
or a new one:
>>> d[dictKey("foo", "bar", "baz", 1, (2, 3, 4))]
'bar'

Update method in Python dictionary

I was trying to update values in my dictionary, I came across 2 ways to do so:
product.update(map(key, value))
product.update(key, value)
What is the difference between them?
The difference is that the second method does not work:
>>> {}.update(1, 2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: update expected at most 1 arguments, got 2
dict.update() expects to find a iterable of key-value pairs, keyword arguments, or another dictionary:
Update the dictionary with the key/value pairs from other, overwriting existing keys. Return None.
update() accepts either another dictionary object or an iterable of key/value pairs (as tuples or other iterables of length two). If keyword arguments are specified, the dictionary is then updated with those key/value pairs: d.update(red=1, blue=2).
map() is a built-in method that produces a sequence by applying the elements of the second (and subsequent) arguments to the first argument, which must be a callable. Unless your key object is a callable and the value object is a sequence, your first method will fail too.
Demo of a working map() application:
>>> def key(v):
... return (v, v)
...
>>> value = range(3)
>>> map(key, value)
[(0, 0), (1, 1), (2, 2)]
>>> product = {}
>>> product.update(map(key, value))
>>> product
{0: 0, 1: 1, 2: 2}
Here map() just produces key-value pairs, which satisfies the dict.update() expectations.
Python 3.9 and PEP 584 introduces the dict union, for updating one dict from another dict.
Dict union will return a new dict consisting of the left operand merged with the right operand, each of which must be a dict (or an instance of a dict subclass). If a key appears in both operands, the last-seen value (i.e. that from the right-hand operand) wins.
See SO: How do I merge two dictionaries in a single expression? for merging with the new augmented assignment version.
This answer.
>>> d = {'spam': 1, 'eggs': 2, 'cheese': 3}
>>> e = {'cheese': 'cheddar', 'aardvark': 'Ethel'}
>>> d | e
{'spam': 1, 'eggs': 2, 'cheese': 'cheddar', 'aardvark': 'Ethel'}
>>> e | d
{'aardvark': 'Ethel', 'spam': 1, 'eggs': 2, 'cheese': 3}
Additional examples from the PEP.
Motivation
The current ways to merge two dicts have several disadvantages:
dict.update
d1.update(d2) modifies d1 in-place. e = d1.copy(); e.update(d2) is not an expression and needs a temporary variable.
{**d1, **d2}
Dict unpacking looks ugly and is not easily discoverable. Few people would be able to guess what it means the first time they see it, or think of it as the "obvious way" to merge two dicts.

Categories