python .get() and None - python

I love python one liners:
u = payload.get("actor", {}).get("username", "")
Problem I face is, I have no control over what 'payload' contains, other than knowing it is a dictionary. So, if 'payload' does not have "actor", or it does and actor does or doesn't have "username", this one-liner is fine.
Problem of course arises when payload DOES have actor, but actor is not a dictionary.
Is there as pretty a way to do this comprehensively as a one liner, and consider the possibility that 'actor' may not be a dictionary?
Of course I can check the type using 'isinstance', but that's not as nice.
I'm not requiring a one liner per se, just asking for the most efficient way to ensure 'u' gets populated, without exception, and without prior knowledge of what exactly is in 'payload'.

Using EAFP
As xnx suggested, you can take advantage of the following python paradigm:
Easier to ask for forgiveness than permission
you can use it on KeyErrors as well:
try:
u = payload["actor"]["username"]
except (AttributeError, KeyError):
u = ""
Using a wrapper with forgiving indexing
Sometimes it would be nice to have something like null-conditional operators in Python. With some helper class this can be compressed into a one-liner expression:
class Forgive:
def __init__(self, value = None):
self.value = value
def __getitem__(self, name):
if self.value is None:
return Forgive()
try:
return Forgive(self.value.__getitem__(name))
except (KeyError, AttributeError):
return Forgive()
def get(self, default = None):
return default if self.value is None else self.value
data = {'actor':{'username': 'Joe'}}
print(Forgive(data)['actor']['username'].get('default1'))
print(Forgive(data)['actor']['address'].get('default2'))
ps: one could redefine __getattr__ as well besides __getitem__, so you could even write Forgive(data)['actor'].username.get('default1').

Why not use an Exception:
try:
u = payload.get("actor", {}).get("username", "")
except AttributeError:
u = ""

The answer hege_hegedus gave is correct, however there's one caveat in that exception handling is a lot slower than going through if..else statement.
For example if you're iterating over thousands of payload object and an actor entry is only occasionally not a dictionary, this code is perfectly valid.
However if you're iterating over thousands of payload objects and every other actor entry is not a dictionary then you'd be better off with this code.
u = ''
if 'actor' in payload and isinstance(payload['actor'], dict):
u = payload['actor'].get('username', '')
For more discussion go here -- https://mail.python.org/pipermail/tutor/2011-January/081143.html
UPDATE
Also the code statement can be re-written as a one-liner albeit not nearly as legible as two-line statement
u = payload['actor'].get('username', '') if 'actor' in payload and isinstance(payload['actor'], dict) else ''

If you really need to do it in 1 line, you'll have to implement the functionality yourself. Which is worth doing if you use this semantics many times in your program.
There are two ways to do it: function or custom dictionary-like object for payload.
1) Function handles the case of actor being not a dict. It can check for isinstance or do the try or whatever else -- it's not essential. The usage would look something like u = get("username", "", payload.get("actor", {})) or u = get("", payload, 'actor', 'username') (with arbitrary amount of nested calls for items in payload).
2) A class of custom objects is a powerful thing -- do it if you can and really need this abstraction in the program. A descendant of dict or UserDict (in Python3) can check for what it stores or outputs on __getitem__ calls.

Related

Python: Safe dictionary access with lists?

Is there an exception free way to access values from a dictionary containing lists. For example, if I have:
data = {
"object_1": {
"object_2": {
"list": [
{
"property": "hello"
}
]
}
}
}
How do I access the path data['object_1']['object_2']['list'][0]['property'] safely(i.e. return some default value if not possible to access without throwing error)? I am trying to avoid wrapping these in try-except's. I have seen the reduce based approach but it doesn't take into account having lists inside the dictionary.
In JS, I can write something like:
data.object_1?.object_2?.list[0]?.property ?? 'nothing_found'
Is there something similar in Python?
For dict you can use the get method. For lists you can just be careful with the index:
data.get('object_1', {}).get('object_2', {}).get('list', [{}])[0].get('property', default)
This is a bit awkward because it makes a new temporary dict or lost for each call get. It's also not super safe for lists, which don't have an equivalent method.
You can wrap the getter in a small routine to support lists too, but it's not really worth it. You're better off writing a one-off utility function that uses either exception handling or preliminary checking to handle the cases you want to react to:
def get(obj, *keys, default=None):
for key in keys:
try:
obj = obj[key]
except KeyError, IndexError:
return default
return obj
Exception handing has a couple of huge advantages over doing it the other way. For one thing, you don't have to do separate checks on the key depending on whether the object is a dict or list. For another, you can support almost any other reasonable type that supports __getitem__ indexing. To show what I mean, here is the asking for permission rather than forgiveness approach:
from collections.abc import Mapping, Sequence
from operator import index
def get(obj, *keys, default=None):
for key in keys:
if isinstance(obj, Mapping):
if key not in obj:
return default
elif isinstance(obj, Sequence):
try:
idx = index(key)
except TypeError:
return default
if len(obj) <= idx or len(obj) < -idx:
return default
obj = obj[key]
return obj
Observe how awkward and error-prone the checking is. Try passing in a custom object instead of a list, or a key that's not an integer. In Python, carefully used exceptions are your friend, and there's a reason it's pythonic to ask for forgiveness rather than for permission.
Uggh. Yeah, accessing such JSON data structures is just terrible,
it's a bit awkward.
Glom to the rescue!
There's two ways to win:
You can just specify ... , default=None) to avoid exceptions, ..or..
Use Coalesce.
print(glom(data, {'object_1.object_2.list': ['property']}, default=None))
In the below code, x will return None if 'object_1'/'object_2'/'list' key does not exist.
Also, if we are able to access 'list' key then we have x as Not None and we should ensure that the length of the list should be greater than zero and then we can search for 'property' key.
x = data.get('object_1', {}).get('object_2', {}).get('list')
if x is not None and len(x) > 0:
print(x[0].get('property'))
else:
print(None)
There is one way to do that, but it would involve the get method and would involve a lot of checking, or using temporary values.
One example lookup function would look like that:
def lookup(data):
object_1 = data.get("object_1")
if object_1 is None:
# return your default
object_2 = object_1.get('object_2')
# and so on...
In Python 3.10 and above, there is also structural pattern matching that can help, in which case you would do something like this:
match data:
case {'object_1': {'object_2': {'list': [{'property': x}]}}}:
print(x) # should print 'hello'
case _:
print(<your_default>)
Please remember that this only works with the latest versions of Python (the online Python console on Python.org is still only on Python3.9, and the code above would cause a syntax error).

Is it possible to determine which level/key of a nested dict that contained None, causing 'NoneType' object is not subscriptable?

The users of my framework (who may or may not be well versed in Python) write code that navigates a dict (that originally came from a json response from some API).
Sometimes they make a mistake, or sometimes the API returns data with some value missing, and they get the dreaded 'NoneType' object is not subscriptable
How can I make it clear at what level the error occured? (what key returned None)
def user_code(some_dict):
# I can't modify this code, it is written by the user
something = some_dict["a"]["b"]["c"]
# I don't control the contents of this.
data_from_api = '{"a": {"b": None}}'
# framework code, I control this
try:
user_code(json.loads(data_from_api))
except TypeError as e:
# I'd like to print an error message containing "a","b" here
I can overload/alter the dict implementation if necessary, but I don't want to do source code inspection.
There may already be answers to this question (or maybe it is impossible), but it is terribly hard to find among all the basic Why am I getting 'NoneType' object is not subscriptable? questions. My apologies if this is a duplicate.
Edit: #2e0byo's answer is the most correct to my original question, but I did find autoviv to provice a nice solution to my "real" underlying issue (allowing users to easily navigate a dict that sometimes doesnt have all the expected data), so I chose that approach instead. The only real down side with it is if someone relies on some_dict["a"]["b"]["c"] to throw an exception. My solution is something like this:
def user_code(some_dict):
# this doesnt crash anymore, and instead sets something to None
something = some_dict["a"]["b"]["c"]
# I don't control the contents of this.
data_from_api = '{"a": {"b": None}}'
# framework code, I control this
user_code(autoviv.loads(data_from_api))
Here is one approach to this problem: make your code return a custom Result() object wrapping each object. (This approach could be generalised to a monad approach with .left() and .right(), but I didn't go there as I don't see that pattern very often (in my admittedly small experience!).)
Example Code
Firstly the custom Result() object:
class Result:
def __init__(self, val):
self._val = val
def __getitem__(self, k):
try:
return self._val[k]
except KeyError:
raise Exception("No such key")
except TypeError:
raise Exception(
"Result is None. This probably indicates an error in your code."
)
def __getattr__(self, a):
try:
return self._val.a
except AttributeError:
if self._val is None:
raise Exception(
"Result is None. This probably indicates an error in your code."
)
else:
raise Exception(
f"No such attribute for value of type {type(self._val)}, valid attributes are {dir(self._val)}"
)
#property
def val(self):
return self._val
Of course, there's a lot of room for improvement here (e.g. __repr__() and you might want to modify the error messages).
In action:
def to_result(thing):
if isinstance(thing, dict):
return Result({k: to_result(v) for k, v in thing.items()})
else:
return Result(thing)
d = {"a": {"b": None}}
r_dd = to_result(d)
r_dd["a"] # Returns a Result object
r_dd["a"]["b"] # Returns a Result object
r_dd["a"]["c"] # Raises a helpful error
r_dd["a"]["b"]["c"] # Raises a helpful error
r_dd["a"]["b"].val # None
r_dd["a"]["b"].nosuchattr # Raises a helpful error
Reasoning
If I'm going to serve up a custom object I want my users to know it's a custom object. So we have a wrapper class, and we tell users that the paradim is 'get at the object, and then use .val to get the result'. Handling the wrong .val is their code's problem (so if .val is None, they have to handle that). But handling a problem in the data structure is sort of our problem, so we hand them a custom class with helpful messages rather than anything else.
Getting the level of the nested error
As currently implemented it's easy to get one above in the error msg (for dict lookups). If you want to get more than that you need to keep a reference up the hierarchy in the Result---which might be better written with Result as something other than just a wrapper.
I'm not sure if this is the kind of solution you were looking for, but it might be a step in the right direction.

Seeking a better way to check an instance property and assign value depending on property type

I'm working with the praw module, and I find that my objects sometimes have a property subreddit that is sometimes a string and that is sometimes an object with its own properties. I've dealt with it using the following:
for c in comments:
if isinstance(c.subreddit, str):
subreddit_name = c.subreddit
else:
subreddit_name = c.subreddit.display_name
I have two functions where I have to do this, and it's really ugly. Is there a better way to deal with this?
I would use EAFP rather than LBYL:
for c in comments:
try:
subreddit_name = c.subreddit.display_name
except AttributeError:
subreddit_name = c.subreddit
You could also try getattr, which takes a default like dict.get:
subreddit_name = getattr(c.subreddit, 'display_name', c.subreddit)
This is effectively a neater version of:
subreddit_name = (c.subreddit.display_name
if hasattr(c.subreddit, 'display_name')
else c.subreddit)

How to avoid type checking arguments to Python function

I'm creating instances of a class Foo, and I'd like to be able to instantiate these in a general way from a variety of types. You can't pass Foo a dict or list. Note that Foo is from a 3rd party code base - I can't change Foo's code.
I know that type checking function arguments in Python is considered bad form. Is there a more Pythonic way to write the function below (i.e. without type checking)?
def to_foo(arg):
if isinstance(arg, dict):
return dict([(key,to_foo(val)) for key,val in arg.items()])
elif isinstance(arg, list):
return [to_foo(i) for i in arg]
else:
return Foo(arg)
Edit: Using try/except blocks is possible. For instance, you could do:
def to_foo(arg):
try:
return Foo(arg)
except ItWasADictError:
return dict([(key,to_foo(val)) for key,val in arg.items()])
except ItWasAListError:
return [to_foo(i) for i in arg]
I'm not totally satisfied by this for two reasons: first, type checking seems like it addresses more directly the desired functionality, whereas the try/except block here seems like it's getting to the same place but less directly. Second, what if the errors don't cleanly map like this? (e.g. if passing either a list or dict throws a TypeError)
Edit: a third reason I'm not a huge fan of the try/except method here is I need to go and find what exceptions Foo is going to throw in those cases, rather than being able to code it up front.
If you're using python 3.4 you can use functools.singledispatch, or a backport for a different python version
from functools import singledispatch
#singledispatch
def to_foo(arg):
return Foo(arg)
#to_foo.register(list)
def to_foo_list(arg):
return [Foo(i) for i in arg]
#to_foo.register(dict)
def to_foo_dict(arg):
return {key: Foo(val) for key, val in arg.items()}
This is a fairly new construct for python, but a common pattern in other languages. I'm not sure you'd call this pythonic or not, but it does feel better than writing isinstances everywhere. Though, in practise, the singledispatch is probably just doing the isinstance checks for you internally.
The pythonic way to deal with your issue is to go ahead and assume (first) that arg is Foo and except any error:
try:
x = Foo(arg)
except NameError:
#do other things
The phrase for this idea is "duck typing", and it's a popular pattern in python.

pythonic way to rewrite an assignment in an if statement

Is there a pythonic preferred way to do this that I would do in C++:
for s in str:
if r = regex.match(s):
print r.groups()
I really like that syntax, imo it's a lot cleaner than having temporary variables everywhere. The only other way that's not overly complex is
for s in str:
r = regex.match(s)
if r:
print r.groups()
I guess I'm complaining about a pretty pedantic issue. I just miss the former syntax.
How about
for r in [regex.match(s) for s in str]:
if r:
print r.groups()
or a bit more functional
for r in filter(None, map(regex.match, str)):
print r.groups()
Perhaps it's a bit hacky, but using a function object's attributes to store the last result allows you to do something along these lines:
def fn(regex, s):
fn.match = regex.match(s) # save result
return fn.match
for s in strings:
if fn(regex, s):
print fn.match.groups()
Or more generically:
def cache(value):
cache.value = value
return value
for s in strings:
if cache(regex.match(s)):
print cache.value.groups()
Note that although the "value" saved can be a collection of a number of things, this approach is limited to holding only one such at a time, so more than one function may be required to handle situations where multiple values need to be saved simultaneously, such as in nested function calls, loops or other threads. So, in accordance with the DRY principle, rather than writing each one, a factory function can help:
def Cache():
def cache(value):
cache.value = value
return value
return cache
cache1 = Cache()
for s in strings:
if cache1(regex.match(s)):
# use another at same time
cache2 = Cache()
if cache2(somethingelse) != cache1.value:
process(cache2.value)
print cache1.value.groups()
...
There's a recipe to make an assignment expression but it's very hacky. Your first option doesn't compile so your second option is the way to go.
## {{{ http://code.activestate.com/recipes/202234/ (r2)
import sys
def set(**kw):
assert len(kw)==1
a = sys._getframe(1)
a.f_locals.update(kw)
return kw.values()[0]
#
# sample
#
A=range(10)
while set(x=A.pop()):
print x
## end of http://code.activestate.com/recipes/202234/ }}}
As you can see, production code shouldn't touch this hack with a ten foot, double bagged stick.
This might be an overly simplistic answer, but would you consider this:
for s in str:
if regex.match(s):
print regex.match(s).groups()
There is no pythonic way to do something that is not pythonic. It's that way for a reason, because 1, allowing statements in the conditional part of an if statement would make the grammar pretty ugly, for instance, if you allowed assignment statements in if conditions, why not also allow if statements? how would you actually write that? C like languages don't have this problem, because they don't have assignment statements. They make do with just assignment expressions and expression statements.
the second reason is because of the way
if foo = bar:
pass
looks very similar to
if foo == bar:
pass
even if you are clever enough to type the correct one, and even if most of the members on your team are sharp enough to notice it, are you sure that the one you are looking at now is exactly what is supposed to be there? it's not unreasonable for a new dev to see this and just fix it (one way or the other) and now its definitely wrong.
Whenever I find that my loop logic is getting complex I do what I would with any other bit of logic: I extract it to a function. In Python it is a lot easier than some other languages to do this cleanly.
So extract the code that just generates the items of interest:
def matching(strings, regex):
for s in strings:
r = regex.match(s)
if r: yield r
and then when you want to use it, the loop itself is as simple as they get:
for r in matching(strings, regex):
print r.groups()
Yet another answer is to use the "Assign and test" recipe for allowing assigning and testing in a single statement published in O'Reilly Media's July 2002 1st edition of the Python Cookbook and also online at Activestate. It's object-oriented, the crux of which is this:
# from http://code.activestate.com/recipes/66061
class DataHolder:
def __init__(self, value=None):
self.value = value
def set(self, value):
self.value = value
return value
def get(self):
return self.value
This can optionally be modified slightly by adding the custom __call__() method shown below to provide an alternative way to retrieve instances' values -- which, while less explicit, seems like a completely logical thing for a 'DataHolder' object to do when called, I think.
def __call__(self):
return self.value
Allowing your example to be re-written:
r = DataHolder()
for s in strings:
if r.set(regex.match(s))
print r.get().groups()
# or
print r().groups()
As also noted in the original recipe, if you use it a lot, adding the class and/or an instance of it to the __builtin__ module to make it globally available is very tempting despite the potential downsides:
import __builtin__
__builtin__.DataHolder = DataHolder
__builtin__.data = DataHolder()
As I mentioned in my other answer to this question, it must be noted that this approach is limited to holding only one result/value at a time, so more than one instance is required to handle situations where multiple values need to be saved simultaneously, such as in nested function calls, loops or other threads. That doesn't mean you should use it or the other answer, just that more effort will be required.

Categories