I'm looking for a way to name a value within an expression to use it multiple times within that expression. Since the value is found inside the expression, I can't save it as a variable using a typical assign statement. I also want its use to be in the same function as the rest of the expression, so I would rather not break it out into a separate function.
More specifically, I enjoy comprehension. List/dictionary comprehension is my favorite Python feature. I'm trying to use both to coerce a dictionary of untrusted structure into a trusted structure (all fields exist, and their values are of the correct type). Without what I'm looking for, it would look something like this:
{
...
'outer': [{
...
'inner': {
key: {
...
'foo': {
'a': get_foo_from_value(value)['a'],
'b': get_foo_from_value(value)['b'],
...
}
} for key, value in get_inner_from_outer(outer)
}
} for outer in get_outer_from_dictionary(dictionary)]
}
Those function calls are actually expressions, but I would like to only evaluate get_foo_from_value(value) once. Ideally there would be something like this:
'foo': {
'a': foo['a'],
'b': foo['b'],
...
} with get_foo_from_value(value) as foo
So far the options I've come up with are single-item generators and lambda expressions. I'm going to include an example of each as an answer so they can be discussed separately.
lambda solution
'foo': (lambda foo: {
'a': foo['a'],
'b': foo['b'],
...
})(get_foo_from_value(value))
I feel like this one isn't as readable as it could be. I also don't like creating a lambda that only gets called once. I like that the name appears before it's used, but I don't like the separation of its name and value.
single-item generator solution
This is currently my favorite solution to the problem (I like comprehension, remember).
'foo': next({
'a': foo['a'],
'b': foo['b'],
...
} for foo in [get_foo_from_value(value)])
I like it because the generator expression matches the rest of the comprehension in the expression, but I'm not a huge fan of the next and having to wrap get_foo_from_value(value) in brackets.
Related
There's some very strange json payloads that I need to parse and I'm complete stuck..
Say I have a nested dictionary with lists that looks like this:
test_dict1 = {
"blah":"blah",
"alerts": [{"test1":"1", "test":"2"}],
"foo": {
"foo":"bar",
"foo1": [{"test3":"3"}]
}}
Is there a function that would be able to give me the value of the key test3? Or rather the first value for the first occurrence of the key test3
Edit
What I meant was a function where I can search for the key test3, since I am mainly concerned about the key and the different dictionaries that I could get may have different structures
Since you do not know how deep inside the value is, it is prob advisable to use a recursive function to iterate through all the layers till it's found. I used DFS below.
def search(ld, find):
if(type(ld)==list):
for i in ld:
if(type(i)==list or type(i)==dict):
result=search(i, find)
if(result!=None): return result
elif(type(ld)==dict):
try:
return ld[find]
except(KeyError):
for i in ld:
if(type(ld[i])==list or type(ld[i])):
result=search(ld[i], find)
if(result!=None): return result
else:
return None
test_dict1 = {
"blah":"blah",
"alerts": [{"test1":"1", "test":"2"}],
"foo": {
"foo":"bar",
"foo1": [{"test3":"3"}]
}}
print(search(test_dict1, "test3"))
Just access it like you would any other nested structure:
test_dict1["foo"]["foo1"][0]["test3"]
Also, what do you mean by the first occurrence? Dictionaries don't have a specific order, so that won't really do much for you.
If you only want the value of test3 then this is how you can get it,
test_dict1["foo"]["foo1"][0]["test3"]
But if you want value dynamically then it will be done with a different approch.
See, you can use a key name when you are working with dictionaries and indexing when it comes to the list.
I'm building some helper functions for other people to use when writing their pytest tests.
One thing that is going to be done frequently in these tests is to ask for a country's quota of a material, where the material might be "spam", "eggs", "sausage", etc.
It so happens that the best way to compute quotas is to do a database query and some post-processing that ends up returning all known quotas. This generates a dictionary of most (but not all) of the country codes, each of which is a dictionary with most (but not all) of the materials as keys:
{ 'CA': { 'spam': 100, 'eggs': 50 },
'US': { 'sausage': 25, 'spam': 100 },
... }
The reason I want to build a helper function, instead of just providing quotas directly to them as a fixture, is to protect the writers of individual tests from having to always say something like
def my_test( quotas ):
if 'XY' not in quotas or 'spam' not in quotas['XY']:
the_quota = 0
else:
the_quota = quotas['XY']['spam']
...
(and the reason they would have to do this is because we don't know up-front what all the possible materials are, so I can't just build a simple fixture that makes sure the dictionary is fully populated.)
I would just like them to be able to do
from helper_functions import quota
def my_test( ):
the_quota = quota( 'XY','spam' )
and have my quota function handle the gory details.
How do I do that, though? Or am I thinking about this all wrong?
I decided I don't understand pytest very well and am also thinking about it all wrong.
There are probably fancier and/or more 21st-century-object-y ways to do it, but I'm going to provide a test function that uses a module global variable to contain the initialized dictionary.
file helper_functions.py:
__quotas = {}
def lookup_quota( country, resource ):
if not __quotas:
initialize_quotas()
if country not in quotas or resource not in quotas[country]:
return(0)
else:
return quotas[country][resource]
The easiest way I can see for solving this is to use a defaultdict.
from collections import defaultdict
quotas = {
'CA': { 'spam': 100, 'eggs': 50 },
'US': { 'sausage': 25, 'spam': 100 },
}
default_quotas = defaultdict(lambda: defaultdict(int))
# default_quotas.update(quotas) # doesn't work
for place, q in quotas.items():
default_quotas[place].update(q)
print(default_quotas["CA"]["spam"]) # 100
print(default_quotas["CA"]["bacon"]) # 0
print(default_quotas["nowhere"]["nothing"]) # 0
Notes:
I don't like having that loop to initialize default_quotas, but with the commented out line the inner dictionaries are regular dict's, not defaultdict's
In case you haven't seen lambdas before, lambda: defaultdict(int) is an anonymous function that returns a defaultdict of ints, so that the top-level structure is a defaultdict of defaultdicts of ints.
Warning: because I'm using defaultdict, anything you query gets inserted, so that after my three print statements those printed 0 values are actually stored in default_quotas.
I'd like to create a dict in a list comprehension that might not have some key. So far, I came up with this, but it looks rather ugly
{
"foo": 1,
<more fields>,
**({"bar": 2} if bar else {})
}
Alternatively,
dict(
foo=1,
<more fields>,
**({"bar": 2} if bar else {})
)
Is there a cleaner way to do this? I'm looking for an expression for a dict where some keys might not be present based on a condition.
If I have a dictionary that is nested, and I pass in a string like "key1.key2.key3" which would translate to:
myDict["key1"]["key2"]["key3"]
What would be an elegant way to be able to have a method where I could pass on that string and it would translate to that key assignment? Something like
myDict.set_nested('key1.key2.key3', someValue)
Using only builtin stuff:
def set(my_dict, key_string, value):
"""Given `foo`, 'key1.key2.key3', 'something', set foo['key1']['key2']['key3'] = 'something'"""
# Start off pointing at the original dictionary that was passed in.
here = my_dict
# Turn the string of key names into a list of strings.
keys = key_string.split(".")
# For every key *before* the last one, we concentrate on navigating through the dictionary.
for key in keys[:-1]:
# Try to find here[key]. If it doesn't exist, create it with an empty dictionary. Then,
# update our `here` pointer to refer to the thing we just found (or created).
here = here.setdefault(key, {})
# Finally, set the final key to the given value
here[keys[-1]] = value
myDict = {}
set(myDict, "key1.key2.key3", "some_value")
assert myDict == {"key1": {"key2": {"key3": "some_value"}}}
This traverses myDict one key at a time, ensuring that each sub-key refers to a nested dictionary.
You could also solve this recursively, but then you risk RecursionError exceptions without any real benefit.
There are a number of existing modules that will already do this, or something very much like it. For example, the jmespath module will resolve jmespath expressions, so given:
>>> mydict={'key1': {'key2': {'key3': 'value'}}}
You can run:
>>> import jmespath
>>> jmespath.search('key1.key2.key3', mydict)
'value'
The jsonpointer module does something similar, although it likes / for a separator instead of ..
Given the number of pre-existing modules I would avoid trying to write your own code to do this.
EDIT: OP's clarification makes it clear that this answer isn't what he's looking for. I'm leaving it up here for people who find it by title.
I implemented a class that did this a while back... it should serve your purposes.
I achieved this by overriding the default getattr/setattr functions for an object.
Check it out! AndroxxTraxxon/cfgutils
This lets you do some code like the following...
from cfgutils import obj
a = obj({
"b": 123,
"c": "apple",
"d": {
"e": "nested dictionary value"
}
})
print(a.d.e)
>>> nested dictionary value
I've found how to split a delimited string into key:value pairs in a dictionary elsewhere, but I have an incoming string that also includes two parameters that amount to dictionaries themselves: parameters with one or three key:value pairs inside:
clientid=b59694bf-c7c1-4a3a-8cd5-6dad69f4abb0&keyid=987654321&userdata=ip:192.168.10.10,deviceid:1234,optdata:75BCD15&md=AMT-Cam:avatar&playbackmode=st&ver=6&sessionid=&mk=PC&junketid=1342177342&version=6.7.8.9012
Obviously these are dummy parameters to obfuscate proprietary code, here. I'd like to dump all this into a dictionary with the userdata and md keys' values being dictionaries themselves:
requestdict {'clientid' : 'b59694bf-c7c1-4a3a-8cd5-6dad69f4abb0', 'keyid' : '987654321', 'userdata' : {'ip' : '192.168.10.10', 'deviceid' : '1234', 'optdata' : '75BCD15'}, 'md' : {'Cam' : 'avatar'}, 'playbackmode' : 'st', 'ver' : '6', 'sessionid' : '', 'mk' : 'PC', 'junketid' : '1342177342', 'version' : '6.7.8.9012'}
Can I take the slick two-level delimitation parsing command that I've found:
requestDict = dict(line.split('=') for line in clientRequest.split('&'))
and add a third level to it to handle & preserve the 2nd-level dictionaries? What would the syntax be? If not, I suppose I'll have to split by & and then check & handle splits that contain : but even then I can't figure out the syntax. Can someone help? Thanks!
I basically took Kyle's answer and made it more future-friendly:
def dictelem(input):
parts = input.split('&')
listing = [part.split('=') for part in parts]
result = {}
for entry in listing:
head, tail = entry[0], ''.join(entry[1:])
if ':' in tail:
entries = tail.split(',')
result.update({ head : dict(e.split(':') for e in entries) })
else:
result.update({head: tail})
return result
Here's a two-liner that does what I think you want:
dictelem = lambda x: x if ':' not in x[1] else [x[0],dict(y.split(':') for y in x[1].split(','))]
a = dict(dictelem(x.split('=')) for x in input.split('&'))
Can I take the slick two-level delimitation parsing command that I've found:
requestDict = dict(line.split('=') for line in clientRequest.split('&'))
and add a third level to it to handle & preserve the 2nd-level dictionaries?
Of course you can, but (a) you probably don't want to, because nested comprehensions beyond two levels tend to get unreadable, and (b) this super-simple syntax won't work for cases like yours, where only some of the data can be turned into a dict.
For example, what should happen with 'PC'? Do you want to make that into {'PC': None}? Or maybe the set {'PC'}? Or the list ['PC']? Or just leave it alone? You have to decide, and write the logic for that, and trying to write it as an expression will make your decision very hard to read.
So, let's put that logic in a separate function:
def parseCommasAndColons(s):
bits = [bit.split(':') for bit in s.split(',')]
try:
return dict(bits)
except ValueError:
return bits
This will return a dict like {'ip': '192.168.10.10', 'deviceid': '1234', 'optdata': '75BCD15'} or {'AMT-Cam': 'avatar'} for cases where each comma-separated component has a colon inside it, but a list like ['1342177342'] for cases where any of them don't.
Even this may be a little too clever; I might make the "is this in dictionary format" check more explicit instead of just trying to convert the list of lists and see what happens.
Either way, how would you put that back into your original comprehension?
Well, you want to call it on the value in the line.split('='). So let's add a function for that:
def parseCommasAndColonsForValue(keyvalue):
if len(keyvalue) == 2:
return keyvalue[0], parseCommasAndColons(keyvalue[1])
else:
return keyvalue
requestDict = dict(parseCommasAndColonsForValue(line.split('='))
for line in clientRequest.split('&'))
One last thing: Unless you need to run on older versions of Python, you shouldn't often be calling dict on a generator expression. If it can be rewritten as a dictionary comprehension, it will almost certainly be clearer that way, and if it can't be rewritten as a dictionary comprehension, it probably shouldn't be a 1-liner expression in the first place.
Of course breaking expressions up into separate expressions, turning some of them into statements or even functions, and naming them does make your code longer—but that doesn't necessarily mean worse. About half of the Zen of Python (import this) is devoted to explaining why. Or one quote from Guido: "Python is a bad language for code golf, on purpose."
If you really want to know what it would look like, let's break it into two steps:
>>> {k: [bit2.split(':') for bit2 in v.split(',')] for k, v in (bit.split('=') for bit in s.split('&'))}
{'clientid': [['b59694bf-c7c1-4a3a-8cd5-6dad69f4abb0']],
'junketid': [['1342177342']],
'keyid': [['987654321']],
'md': [['AMT-Cam', 'avatar']],
'mk': [['PC']],
'playbackmode': [['st']],
'sessionid': [['']],
'userdata': [['ip', '192.168.10.10'],
['deviceid', '1234'],
['optdata', '75BCD15']],
'ver': [['6']],
'version': [['6.7.8.9012']]}
That illustrates why you can't just add a dict call for the inner level—because most of those things aren't actually dictionaries, because they had no colons. If you changed that, then it would just be this:
{k: dict(bit2.split(':') for bit2 in v.split(',')) for k, v in (bit.split('=') for bit in s.split('&'))}
I don't think that's very readable, and I doubt most Python programmers would. Reading it 6 months from now and trying to figure out what I meant would take a lot more effort than writing it did.
And trying to debug it will not be fun. What happens if you run that on your input, with missing colons? ValueError: dictionary update sequence element #0 has length 1; 2 is required. Which sequence? No idea. You have to break it down step by step to see what doesn't work. That's no fun.
So, hopefully that illustrates why you don't want to do this.