I have an object called Song, which is defined as:
class Song(object):
def __init__(self):
self.title = None
self.songauthor = None
self.textauthor = None
self.categories = None
Inside this class I have a method that parses a run-time property of that object, "metadata", which is basically just a text file with some formatted text that I parse with regular expressions. During this process, I have come up with the following code that I am pretty certain can be simplified to a loop.
re_title = re.compile("^title:(.*)$", re.MULTILINE)
re_textauthor = re.compile("^textauthor:(.*)$", re.MULTILINE)
re_songauthor = re.compile("^songauthor:(.*)$", re.MULTILINE)
re_categories = re.compile("^categories:(.*)$", re.MULTILINE)
#
# it must be possible to simplify the below code to a loop...
#
tmp = re_title.findall(self.metadata)
self.title = tmp[0] if len(tmp) > 0 else None
tmp = re_textauthor.findall(self.metadata)
self.textauthor = tmp[0] if len(tmp) > 0 else None
tmp = re_songauthor.findall(self.metadata)
self.songauthor = tmp[0] if len(tmp) > 0 else None
tmp = re_categories.findall(self.metadata)
self.categories = tmp[0] if len(tmp) > 0 else None
I'm guessing this can be done by encapsulating a reference to the property (e.g. self.title) and the corresponding regular expression (re_title) in a datatype (possibly tuple), and then iterate over a list of these data types.
I have a tried using a tuple as such:
for x in ((self.title, re_title),
(self.textauthor, re_textauthor),
(self.songauthor, re_songauthor),
(self.categories, re_categories)):
data = x[1].findall(self.metadata)
x[0] = data[0] if len(data) > 0 else None
This failed horribly as I cannot modify a tuple in run-time. Can anyone provide a suggestion as to how I can pull this off?
There are two problems with your code.
The big one is that x[0] is not a reference to self.title, it's a reference to the value of self.title. In other words, you're just copying the existing title into a tuple, then replacing that title in the tuple with a different one, which has no effect on the existing title.
The smaller one is that you can't replace elements in a tuple. You could fix that trivially by using a list instead of a tuple, but you're still going to have the big problem.
So, how do you create references to variables in Python? You can't. You need to think of a way to reorganize things. For example, maybe you can access these things by name, instead of by reference. Instead of four separate variables, store a dictionary of four variables in a single dictionary:
res = {
'title': re.compile("^title:(.*)$", re.MULTILINE),
'textauthor': re.compile("^textauthor:(.*)$", re.MULTILINE)
'songauthor': re.compile("^songauthor:(.*)$", re.MULTILINE)
'categories': re.compile("^categories:(.*)$", re.MULTILINE)
}
class Song(object):
def __init__(self):
self.properties = {}
def parsify(self, text):
for thing in ('title', 'textauthor', 'songauthor', 'categories'):
data = res[thing].findall(self.metadata)
self.properties[thing] = data[0] if len(data) > 0 else None
You could also use for thing in res: there, because that will iterate over all the keys (in arbitrary order, but you probably don't care about the order).
If you really need to have self.title, you've run into a common problem. Usually, there's a clear distinction between data—which should be referred to by runtime strings—and attributes—which should not. But sometimes, there isn't. So you have to bridge between them in some way. You can create four #property fields that return self.properties['title'], or you can use setattr(self, thing, …) instead of self.properties[thing], or various other possibilities. Which one is best comes down to whether they're more data-like or more attribute-like.
Instead of assigning to the tuple, update the class members directly:
all_res = {'title':re_title,
'textauthor': re_textauthor,
'songauthor': re_song_author,
'categories': re_categories}
for k, v in all_res.iteritems():
tmp = v.findall(self.metadata)
if tmp:
setattr(self, k, tmp[0])
else:
setattr(self, k, None)
If you only care about the first match, you don't need to use findall.
abarnert's answer has given a good explanation of what is going wrong with your code, but I wanted to offer up an alternative solution. Rather than using a loop to assign each variable, try creating an iterable of the different values from the parsed file, then use a single unpacking-assignment to get them into the various variables.
Here's a two-statement solution using a list comprehension, which is made just a bit tricky by the fact that you need to reference the result of findall twice in if/else expression (thus the nested generator expression):
vals = [x[0] if len(x) > 0 else None for x in (regex.findall(self.metadata) for regex in
[re_title, re_textauthor,
re_songauthor, re_categories])]
self.title, self.textauthor, self.songauthor, self.categories = vals
You can probably simplify things a little bit in the first part of the list comprehension. To start with, you can just test if x rather than if len(x) > 0. Or, if you're not too attached to using findall, you could use search instead, then just use x and x.group(0) instead of the whole if/else bit. The search method returns None if no match was found, so the short-circuiting behavior of the and operator will do exactly what we want.
An example would be to use a dictionary like this:
things = {}
for x in ((self.title, re_title),
(self.textauthor, re_textauthor),
(self.songauthor, re_songauthor),
(self.categories, re_categories)):
if len(x[1].findall(self.metadata):
things[x[0]] = x[1].findall(self.metadata)[1]
else:
things[x[0]] = None
Could this be a possible solution?
Related
I have to update a nested JSON object.
If I knew the specifics of which items were to be updated I could do :
json_object['basket']['items']['apple'] = 'new value'
However, my list of elements to target is dynamic.
> basket.items.apple = 'green'
> name = 'my shopping'
> basket.cost = '15.43'
I could do this by looping through elements.
Find 'basket' > then find 'items > then find 'apple' > set value
Find 'name' > set value
However, was hoping that there was a way to just reference directly/dynamicaly.
i.e. from a string 'basket.cost', build the expression :
json_object['basket']['cost']
P.s. it has to cope with lists of dictionaries too !
Any guidance appreciated :)
Once you have the string "basket.cost", you can split it on "." and it's pretty easy to drill down into json_object['basket']['cost'] using a loop. Functionally, there is no difference between doing this and doing it "directly": you are still getting the 'basket' key first, and then getting the 'cost' key from the value of json_object['basket'].
def get_element(d, path):
# This function can take the string "basket.cost", or the list ["basket", "cost"]
if isinstance(path, str):
path = path.split(".")
for p in path:
d = d[p]
return d
def set_element(d, path, value):
path = path.split(".")
dict_to_set = get_element(d, path[:-1])
key_to_set = path[-1]
dict_to_set[key_to_set] = value
set_element(json_object, "basket.items.apple", 100)
Now, this assumes all elements of your path already exist, so let's say you create a dictionary that looks like so:
json_object = {"basket": {"items": dict()}}
set_element(json_object, "basket.items.apple", 100)
set_element(json_object, "basket.cost", 10)
print(json_object)
# Output: {'basket': {'items': {'apple': 100}, 'cost': 10}}
print(get_element(json_object, "basket.cost"))
# Output: 10
If you try to access an element that doesn't already exist, you get a KeyError:
get_element(json_object, "basket.date")
# KeyError: 'date'
This also happens if you try to set a value in an element that doesn't exist:
set_element(json_object, "basket.date.day", 1)
# KeyError: 'date'
If we want to allow your function to create the dictionaries when they don't exist, we can modify the get_element function to account for this situation and add the key:
def get_element(d, path, create_missing=False):
# This function can take the string "basket.cost", or an iterable containing the elements "basket" and "cost"
if isinstance(path, str):
path = path.split(".")
for p in path:
if create_missing and p not in d:
d[p] = dict()
d = d[p]
return d
def set_element(d, path, value, create_missing=True):
path = path.split(".")
dict_to_set = get_element(d, path[:-1], create_missing)
key_to_set = path[-1]
dict_to_set[key_to_set] = value
set_element(json_object, "basket.date.day", 1)
print(json_object)
# Output: {'basket': {'items': {'apple': 100}, 'cost': 10, 'date': {'day': 1}}}
If using third party package is an option, you can try python-box. It comes with lots of options and utilities to load from json, yaml files. The implementation is optimized for speed using Cython.
from box import Box
test_data = {
"basket": {
"products": [
{"name": "apple", "colour": "green"}
],
}
}
a = Box(test_data)
a.basket.cost = 12.3
a.basket.products[0].colour = "pink"
a.basket.products.append({"name": "pineapple", "taste": "sweet"})
print(a.basket.products[1].taste)
You can get exactly what you want by overloading some python magic methods: __getattr__ and __setattr__. I'll show an example of the API to wet the appetite and then the full code:
test_data = {'basket': {'items': [{'name': 'apple', 'colour': 'green'},
{'name': 'pineapple', 'taste': 'sweet',},
],
'cost': 12.3,
},
'name': 'Other'}
o = wrap(test_data) # This wraps with the correct class, depending if it is a dict or a list
print(o.name) # Prints 'Other'
print(o.basket.items) # Prints the list of items
print(o.basket.cost) # Prints 12.3
o.basket.cost = 10.0 # Changes the cost
assert o.basket.cost == 10.0
assert len(o) == 2
assert len(o.basket.items) == 2
o.basket.items.append({'name': 'orange'})
o.basket.items[2].colour = 'yellow' # It works with lists!
assert o.basket.items[2].name == 'orange'
assert o.basket.items[2].colour == 'yellow'
# You can get a part of it and it holds a reference to the original
b = o.basket
b.type = 'groceries'
assert o.basket.type == 'groceries'
# It is also possible to create a separate wrapped part and then join:
employees = wrap({})
employees.Clara.id = 101
employees.Clara.age = 23
employees.Lucia.id = 102
employees.Lucia.age = 29
o.employees = employees
The implementation is based on special wrapper classes, one for dicts, another for lists. They all inherit from a base class. Note that the need to use super().__setattr__ instead of simply self._data is because we will override the __getattr__ and __setattr__ methods to look for the data inside _data. Of course it gives an infinite loop when you try to define _data.
from collections.abc import Mapping, Sequence, MutableSequence
class BaseWrapper:
__slots__ = ('_data')
def __init__(self, data):
super().__setattr__('_data', data)
def __repr__(self):
return f'{self.__class__.__name__}({repr(self._data)})'
The wrapper for dictionaries is the most interesting: it uses __getattr__ to look for a key in the wrapped dictionary. This allows for a very natural API: if o is a wrapped dictionary, o.entry will give the same result as o['entry']. Most of the code should be self-explanatory, there are only two tricks: the first is that __getattr__ checks if the output is a dict or list and wraps it. This allows for chaining of calls like o.basket.cost. The downside is that a new wrapper is created every call. The second trick is when setting an attribute: it checks if what is being set is a wrapped instance and un-wraps it. Thus, wrapped dictionaries can be combined and the underlying dictionary is always "clean".
class MappingWrapper(BaseWrapper):
"""Wraps a dictionary and provides the keys of the dictionary as class members.
Create new keys when they do not exist."""
def __getattr__(self, name):
# Note: these two lines allow automatic creation of attributes, e.g. in an object 'obj'
# that doesn't have an attribute 'car', the following is possible:
# >> o.car.colour = 'blue'
# And all the missing levels will be automatically created
if name not in self._data and not name.startswith('_'):
self._data[name] = {}
return wrap(self._data[name])
def __setattr__(self, name, value):
self._data[name] = unwrap(value)
# Implements standard dictionary access
def __getitem__(self, name):
return wrap(self._data[name])
def __setitem__(self, name, value):
self._data[name] = unwrap(value)
def __delitem__(self, name):
del self._data[name]
def __len__(self):
return len(self._data)
The list wrapper is simpler, no need to mess around with attribute access. The only special care we have to take is to wrap and unwrap the list elements when one is requested/set. Note that, just like with the dictionary wrapper, the same wrap and unwrap functions are used (in __getitem__/__setitem__/insert).
class ListWrapper(BaseWrapper, MutableSequence):
"""Wraps a list. Essentially, provides wrapping of elements of the list."""
def __getitem__(self, idx):
return wrap(self._data[idx])
def __setitem__(self, idx, value):
self._data[idx] = unwrap(value)
def __delitem__(self, idx):
del self._data[idx]
def __len__(self):
return len(self._data)
def insert(self, index, obj):
self._data.insert(index, unwrap(obj))
Finally, the definition of wrap, which just selects the correct wrapper based on the type of the input, and unwrap, which extracts the raw data:
def wrap(obj):
if isinstance(obj, dict):
return MappingWrapper(obj)
if isinstance(obj, list):
return ListWrapper(obj)
return obj
def unwrap(obj):
if isinstance(obj, BaseWrapper):
return obj._data
return obj
The full code can be found in this gist.
An important caveat: to keep the implementation simple, wrapper objects are created at every access. Thus using this method inside large loops may cause performance issues (per my measurements, this method of access is between 12 to 30 times slower).
I'm going to assume that you already know how to handle the value errors that will probably come up with this nested collection accessing, so I won't focus on it in my approach.
I would split this in two parts:
Traversing a nested collection according to a list of keys for each level
Getting a list of keys out of a string
The first one is quite trivial, where as you said simply looping through the keys and getting to the end of those gives you access to the collection element in question. A simple implementation of that could look something like this:
def get_nested(collection, key):
for part in key:
collection = collection[part]
return collection
def set_nested(collection, key, value):
for part in key[:-1]:
collection = collection[part]
collection[key[-1]] = value
Here the key is expected to be some iterable of keys, such as a tuple or list.
Of course that means there is an expectation that your string representing a path along the collection is already parsed. We can get to that next.
This step would also be very trivial, since one could simply expression.split(".") it. However, since you also want to be able to index nested lists along with dicts, it get's a little more complicated.
There is a tradeoff to be made here. One could simply say: "Any time that one of the items in expression.split(".") can be parsed to an int, we will do just that, and assume that it was ment as an index in a list", however naturally that isn't necessarily the case. There is nothing preventing you from using a number in string form as a key in a dict. However if you think this is never going to be the case for you, perhaps the you can just call it like this:
set_nested(
collection,
(int(part) if part.isdigit() else part for part in expression.split(".")),
"target value",
)
(or of course wrap it in another function like this).
However if the consideration of using digit keys in dicts is important for you, there is another solution:
Whenever traversing the nested collection downward, we check if the collection we are currently looking at is a list. Only if it is a list, do we actually try to parse the path part as an int.
This would be the respective set_nested and get_nested functions for that:
def get_nested(collection, key: str):
for part in key.split("."):
if type(collection) == list:
part = int(part)
collection = collection[part]
return collection
def set_nested(collection, key: str, val):
key = key.split(".")
for i, part in enumerate(key):
if type(collection) == list:
part = int(part)
if i == len(key) - 1:
collection[part] = val
else:
collection = collection[part]
I believe that's the simplest solution to your problem, though of course it's important to keep in mind:
There is no error handling in this code, and indexing on dynamic paths is a topic where you are bound to run into errors. Depending on where and how you want to handle those it's going to be easy or very tedious.
There is no checking of setting values in dicts that don't exist yet, or for expanding arrays to a specific size, but since you didn't mention those that as a requirement I'm presuming it's not an issue. It might be for others reading this.
This is tricky and I would discourage it unless necessary as it is an easy thing to design and implmenet badly.
First: it's easy to split on path separator and follow the object tree to the desired key.
But after a while questions will start to appear. E.g.: what separator to split on?
A slash? It can appear in the JSON dictionary key... A dot? Same.
We'll need to either restrict legal / handled paths or implement some kind of escaping mechanism.
How do you handle empty strings?
Another goal: handle lists... Ok. So how do we interpret a path a.0? Is it ['a'][0] or ['a']['0'] ?
It seem that we'll have to complicate the language or drop the requirement.
So, in general -- I'd avoid it. Ultimately here's a quick implementation which
desing choices may or may not satisfy you:
there's basic backslash escaping of path separator
empty string is allowed as a key
lists are not handled due to ambiguity
def deep_set(root: dict, path: str, value):
segments = [*iter_segments(path, '.')]
for k in segments[:-1]:
root = root[k]
root[segments[-1]] = value
def iter_segments(path: str, separator: str = '.'):
segment = ''
path_iter = iter(path)
while True:
c = next(path_iter, '')
if c in ('.', ''):
yield segment
segment = ''
if c == '':
break
continue
elif '\\' == c:
c = next(path_iter, '')
segment += c
Let's say I know beforehand that the string
"key1:key2[]:key3[]:key4" should map to "newKey1[]:newKey2[]:newKey3"
then given "key1:key2[2]:key3[3]:key4",
my method should return "newKey1[2]:newKey2[3]:newKey3"
(the order of numbers within the square brackets should stay, like in the above example)
My solution looks like this:
predefined_mapping = {"key1:key2[]:key3[]:key4": "newKey1[]:newKey2[]:newKey3"}
def transform(parent_key, parent_key_with_index):
indexes_in_parent_key = re.findall(r'\[(.*?)\]', parent_key_with_index)
target_list = predefined_mapping[parent_key].split(":")
t = []
i = 0
for elem in target_list:
try:
sub_result = re.subn(r'\[(.*?)\]', '[{}]'.format(indexes_in_parent_key[i]), elem)
if sub_result[1] > 0:
i += 1
new_elem = sub_result[0]
except IndexError as e:
new_elem = elem
t.append(new_elem)
print ":".join(t)
transform("key1:key2[]:key3[]:key4", "key1:key2[2]:key3[3]:key4")
prints newKey1[2]:newKey2[3]:newKey3 as the result.
Can someone suggest a better and elegant solution (around the usage of regex especially)?
Thanks!
You can do it a bit more elegantly by simply splitting the mapped structure on [], then interspersing the indexes from the actual data and, finally, joining everything together:
import itertools
# split the map immediately on [] so that you don't have to split each time on transform
predefined_mapping = {"key1:key2[]:key3[]:key4": "newKey1[]:newKey2[]:newKey3".split("[]")}
def transform(key, source):
mapping = predefined_mapping.get(key, None)
if not mapping: # no mapping for this key found, return unaltered
return source
indexes = re.findall(r'\[.*?\]', source) # get individual indexes
return "".join(i for e in itertools.izip_longest(mapping, indexes) for i in e if i)
print(transform("key1:key2[]:key3[]:key4", "key1:key2[2]:key3[3]:key4"))
# newKey1[2]:newKey2[3]:newKey3
NOTE: On Python 3 use itertools.zip_longest() instead.
I still think you're over-engineering this and that there is probably a much more elegant and far less error-prone approach to the whole problem. I'd advise stepping back and looking at the bigger picture instead of hammering out this particular solution just because it seems to be addressing the immediate need.
The code below works, but looks very ugly. I'm looking for a more pythonic way to write the same thing.
The goal:
React on a result of a function that returns multiple values.
Example function
def myfilterfunc(mystr):
if 'la' in mystr:
return True, mystr
return False, None
This returns True and a string (if the string cointains "la"), or False and nothing.
In a second function, I'm passing myfilterfunc as an optional parameter
def mymainfunc(mystr,filterfunc=None):
This function fills a returnlist.
If no function is given, the result is not filtered and added as is.
If a filter function is given, if the filter function returns
True, a returned string is added. (This is just an example that would
easily work with one return value, but I'm trying to get the systax
right for a more complicated setup)
if filterfunc:
tmp_status,tmp_string = filterfunc(mystr[startpos:nextitem])
if tmp_status:
returnlist.append(tmp_string)
else:
returnlist.append(mystr[startpos:nextitem])
Any idea how I can write this without using temporary variables to store the return values of the function?
Full "working" test code below
def string2list(mystr,splitlist,filterfunc=None):
returnlist = []
startpos = 0
nextitem = -1
matched = True
while matched:
matched = False
for sub in splitlist:
if startpos == 0:
tmpi = mystr.find(sub)
else:
tmpi = mystr.find(sub,startpos + 1)
if (tmpi > 0) and ((nextitem < 0) or (nextitem > tmpi)):
nextitem = tmpi
matched = True
if filterfunc:
tmp_status,tmp_string = filterfunc(mystr[startpos:nextitem])
if tmp_status:
returnlist.append(tmp_string)
else:
returnlist.append(mystr[startpos:nextitem])
startpos = nextitem
nextitem = -1
return returnlist
def myfilterfunc(mystr):
if 'la' in mystr:
return True,mystr
return False,''
splitlist = ['li','la']
mytext = '''
li1
li2
li3
fg4
fg5
fg6
la7
la
la
tz
tz
tzt
tz
end
'''
print string2list(mytext,splitlist)
print
print string2list(mytext,splitlist,myfilterfunc)
If this is going to happen often you can factor out the uglyness:
def filtered(f, x):
if f:
status, result = f(x)
return result if status else x
else:
return x
used like
returnlist.append(filtered(filterfunc, mystr[startpos:nextitem]))
so that if you have many similar optional filters the code remains readable. This works because in Python functions/closures are first class citizens and you can pass them around like other values.
But then if the logic is about always adding (either the filtered or the unfiltered) why not just write the filter to return the input instead of (False, "") in case of failure?
That would make the code simpler to understand...
returnlist.append(filterfunc(mystr[startpos:nextitem]))
I think there are two better approaches to your problem that don't involve using two return values.
The first is to simply return a Boolean value and not a string at all. This works if your filter is always going to return the string it was passed unmodified if it returns a string at all (e.g. if the first value is True). This approach will let you avoid using temporary values at all:
if filterfunc:
if filterfunc(mystr[startpos:nextitem]):
returnlist.append(mystr[startpos:nextitem])
(Note, I'd suggest renaming filterfunc to predicate if you go this route.)
The other option will work if some filterfunc might return a different second value than it was passed under some situations, but never the 2-tuple True, None. In this approach you simply use the single value as both the signal and the payload. If it's None, you ignore it. If it's anything else, you use it. This does require a temporary variable, but only one (and it's a lot less ugly).
if filterfunc:
result = filterfunc(mystr[startpos:nextitem])
if result is not None:
returnlist.append(result)
I am working with data pulled from a spreadsheet-like file. I am trying to find, for each "ligand", the item with the lowest corresponding "energy". To do this I'm trying to make a list of all the ligands I find in the file, and compare them to one another, using the index value to find the energy of each ligand, keeping the one with the lowest energy. However, the following loop is not working out for me. The program won't finish, it just keeps running until I cancel it manually. I'm assuming this is due to an error in the structure of my loop.
for item in ligandList:
for i in ligandList:
if ligandList.index(item) != ligandList.index(i):
if ( item == i ) :
if float(lineList[ligandList.index(i)][42]) < float(lineList[ligandList.index(item)][42]):
lineList.remove(ligandList.index(item))
else:
lineList.remove(ligandList.index(i))
As you can see, I've created a separate ligandList containing the ligands, and am using the current index of that list to access the energy values in the lineList.
Does anyone know why this isn't working?
It is a bit hard to answer without some actual data to play with, but I hope this works, or at least leads you into the right direction:
for idx1, item1 in enumerate(ligandList):
for idx2, item2 in enumerate(ligandList):
if idx1 == idx2: continue
if item1 != item2: continue
if float(lineList[idx1][42]) < float(lineList[idx2][42]):
del lineList [idx1]
else:
del lineList [idx2]
That’s a really inefficient way of doing things. Lots of index calls. It might just feel infinite because it’s slow.
Zip your related things together:
l = zip(ligandList, lineList)
Sort them by “ligand” and “energy”:
l = sorted(l, key=lambda t: (t[0], t[1][42]))
Grab the first (lowest) “energy” for each:
l = ((lig, lin[1].next()[1]) for lig, lin in itertools.groupby(l, key=lambda t: t[0]))
Yay.
result = ((lig, lin[1].next()[1]) for lig, lin in itertools.groupby(
sorted(zip(ligandList, lineList), key=lambda t: (t[0], t[1][42])),
lambda t: t[0]
))
It would probably look more flattering if you made lineList contain classes of some kind.
Demo
You look like you're trying to find the element in ligandList with the smallest value in index 42. Let's just do that....
min(ligandList, key=lambda x: float(x[42]))
If these "Ligands" are something you use regularly, STRONGLY consider writing a class wrapper for them, something like:
class Ligand(object):
def __init__(self,lst):
self.attr_name = lst[index_of_attr] # for each attribute
... # for each attribute
... # etc etc
self.energy = lst[42]
def __str__(self):
"""This method defines what the class looks like if you call str() on
it, e.g. a call to print(Ligand) will show this function's return value."""
return "A Ligand with energy {}".format(self.energy) # or w/e
def transmogfiscate(self,other):
pass # replace this with whatever Ligands do, if they do things...
In which case you can simply create a list of the Ligands:
ligands = [Ligand(ligand) for ligand in ligandList]
and return the object with the smallest energy:
lil_ligand = min(ligands, key=lambda ligand: ligand.energy)
As a huge aside, PEP 8 encourages the use of the lowercase naming convention for variables, rather than mixedCase as many languages use.
I have a method that currently returns None or a dict.
result,error = o.apply('grammar')
The caller currently has to check for the existence of two keys to decide what kind of object was returned.
if 'imperial' in result:
# yay
elif 'west' in result:
# yahoo
else:
# something wrong?
Because result can be None, I'm thinking of returning an empty dict instead, so the caller does not need to check for that. What do you think ?
For comparison, in the re module, the result of calling match can result in None.
p = re.compile('\w+')
m = p.match( 'whatever' )
But in this case, m is an object instance. In my case, I am returning a dict which should either be empty or have some entries.
Yes I think returning an empty dict (or where applicable an empty list) is preferable to returning None as this avoids an additional check in the client code.
EDIT:
Adding some code sample to elaborate:
def result_none(choice):
mydict = {}
if choice == 'a':
mydict['x'] = 100
mydict['y'] = 1000
return mydict
else:
return None
def result_dict(choice):
mydict = {}
if choice == 'a':
mydict['x'] = 100
mydict['y'] = 1000
return mydict
test_dict = result_dict('b')
if test_dict.get('x'):
print 'Got x'
else:
print 'No x'
test_none = result_none('b')
if test_none.get('x'):
print 'Got x'
else:
print 'No x'
In the above code the check test_none.get(x) throws an AttributeError as
result_none method can possibly return a None. To avoid that I have to add an
additional check and might rewrite that line as:
if test_none is not None and test_none.get('x') which is not at all needed
if the method were returning an empty dict. As the example shows the check test_dict.get('x') works fine as the method result_dict returns an empty dict.
I'm not entirely sure of the context of this code, but I'd say returning None suggests that there was somehow an error and the operation could not be completed. Returning an empty dictionary suggests success, but nothing matched the criteria for being added to the dictionary.
I come from a completely different background (C++ Game Development) so take this for what it's worth:
For performance reasons though, might be nice to return None and save whatever overhead, though minimal, may be involved in creating an empty dictionary. I find that, generally, if you're using a scripting language, you're not concerned about the performance of that code. If you were, you probably wouldn't be writing that feature in said language unless required for some unavoidable reason.
As others have said, an empty dict is falsy, so there's no problem there. But the idea of returning an empty dict leaves a bad taste in my mouth. I can't help but feel that returning an empty dict could hide errors that returning None would reveal. Still, it's just a gut feeling.
After more thought, I think returning an empty dict might be more Pythonic. A good rule of thumb might be to always return an empty container if you write a function/method which returns a container. Several examples of this behavior:
"".split() == []
filter(lambda a:False, [1,2]) == []
range(1, -1) == []
re.findall('x', '') = []
In contrast if you are trying to get a single object, you have no choice but to return None I suppose. So I guess None is like the empty container for single objects! Thanks to KennyTM for arguing some sense into me :D
Python supports returning multiple values. Hence, you can return a status of success along with an empty dictionary. The caller first verifies the return code and then uses the dictionary.
def result_none(choice):
mydict = {}
retcode = -1
if choice == 'a':
mydict['x'] = 100
mydict['y'] = 1000
retcode = 0
return mydict, retcode
retcode, mydict = result_none('a')
if retcode == 0:
<<use dictionary>>