I have a line of code like this:
mydict['description_long'] = another_dict['key1'][0]['a_really_long_key']['another_long_key']['another_long_key3']['another_long_key4']['another_long_key5']
How do I format it so it adheres to the PEP8 guidelines?
The only relevant part of PEP8's style guidelines here is line length. Just break up the dict keys into their own separate lines. This makes the code way easier to read as well.
mydict['description_long'] = (another_dict['key1']
[0]
['a_really_long_key']
[etc.])
I think I'd do something like this, add parens to go over multiple lines:
mydict['description_long'] = (
another_dict['key1'][0]['a_really_long_key']['another_long_key']
['another_long_key3']['another_long_key4']['another_long_key5'])
Though it'd be better not to have such a deep structure in the first place, or to split up the lookup into several, if you can give those good names:
item = another_dict['key1'][0]['a_really_long_key']
part_name = item['another_long_key']['another_long_key3']
detail = part_name['another_long_key4']['another_long_key5']
At least that way the deep structure is documented a little.
each [ is a bracket. So it nominally just like nesting parenthesis:
mydict['description_long'] = another_dict['key1'][0][
'a_really_long_key']['another_long_key'][
'another_long_key3']['another_long_key4'][
'another_long_key4']
A more generic way might be to just do some metaprogramming and use a series of list comprehensions or iteration to expand child datastructures. For example, your child node can be found by following a path represented by the list:
keypath = ['key1', 0, 'a_really_long_key', 'another_long_key',
'another_long_key3','another_long_key4',
'another_long_key4']
so you reference your final node by something like:
def resolve_child(root, path):
for e in path:
child = root[e]
root = child
return child
mydict['description_long'] = resolve_path(another_dict, keypath)
Or if you want to be all functional (Note that reduce() is moved to functools in Py3K):
mydict['description_long'] = reduce(lambda p,c: p[c], keypath, another_dict)
It is usually rare that you have to explicitly reference a deeply nested structure like that; usually the structure is being instantiated by some function, like json.parse or lxml.objectify
Related
I need to sort my query string by different parameters.
For example:
[url]/handler?sort=-key1,key2
Now, the sorting happens on a list of dicts on each dict's metadata like so:
sorted(results_list, key=lambda obj:[(-obj["metadata"][x[0]] if x[1] == "desc" else obj["metadata"][x[0]]) for x in sort])
The parameters are digested from my handler and passed to the logic manager to process the sorting. This code is python 3. I want to make this sorting mechanism more extensible. Currently, it won't allow any sorting to happen that is:
Outside the metadata property scope.
not a number (for example sorting alphabetically).
I should say that I devoured answers here and saw that many of them suggest to override the cmp function and basically recreate my own sorting module from scratch. I don't think that either what I'm looking for hasn't been done before, or that it merits that kind of implementation.
I could be wrong. We always learn.
So, how should I redesign this code to be more extensible?
Something like this:
you should be able to pass a list SortInstruction via your API (handler). get_sorted_list will take care of the actual sorting.
from typing import NamedTuple
lst = [{'meta':{'k':12},'k1': 66,'k2': 'jack'},
{'meta':{'k':99},'k1': 656,'k2': 'zoo'},
{'meta':{'k':134},'k1': 166,'k2': 'dan'}]
class SortInstruction(NamedTuple):
in_meta_data: bool
reverse: bool
field_name: str
def get_sorted_list(sort_instructions):
result = lst
for si in sort_instructions:
result = sorted(result, key = lambda x : x[si.field_name] if not si.in_meta_data else x['meta'][si.field_name] , reverse = si.reverse)
return result
sorted_list = get_sorted_list([SortInstruction(True,False,'k'),SortInstruction(False,False,'k1')])
for entry in sorted_list:
print(entry)
To make the code more readable and easy for maintenance you'd like to extend the sorting logic and design in a SOLID manner.
I'd advise:
Handler digests the parameters from the URL to a context object that holds them all.
Handler passes the context object to the LogicManager, which detects which list to sort and in what way.
LogicManager send the list to be sorted and the sorting manner as originally described in the URL, to Sorter that sortes the lists as described.
In such a way, each class has its sole propose and the code is easier to maintain.
Nonetheless, you can scale sorting by adding more Sorter threads or micro-services.
Using plistlib to load a plist file in Python, I have a data structure to work with wherein a given path to a key-value pair should never fail, so it's acceptable IMO to hard-code the path without .get() and other tricks -- however, it is a long and ugly path. The plist is full of dicts in arrays in dicts, so it ends up looking like this:
def input_user_data(plist, user_text):
new_text = clean_user_data(user_data)
plist['template_data_array'][0]['template_section']['section_fields_data'][0]['disclaimer_text'] = new_text #do not like
Apart from being way past the 79 character limit, it just looks lugubrious and sophomoric. However it seems equally silly to step through it like this:
#....
one = plist['template_data_array']
two = one[0]['template_section']['section_fields_data']
two[0]['disclaimer_text'] = new_text
...because I don't really need all those assignments, I'm just looking to sanitize the user text and toss it into the predefined section of a plist.
When dealing with a nested path that will always exist but is just tedious to access (and may indeed need to be found again by other methods), is there a shorter technique to employ, or do I just grin and bear the lousy nested structure that I have no control over?
When you see a lot of duplicated or boilerplate code, this is often a hint that you can refactor the repetitive operations into a function. Writing get_node and set_node helper functions not only makes the code that sets the values simpler, it allows you to easily define the paths as constants, which you can put all in one place in your code for easier maintenance.
def get_node(container, path):
for node in path:
container = container[node]
return container
def set_node(container, path, value):
container = get_node(container, path[:-1])
container[path[-1]] = value
DISCLAIMER_PATH = ("template_data_array", 0, "template_section", "section_fields_data",
0, "disclaimer_text")
set_node(plist, DISCLAIMER_PATH, new_text)
Potentially, you could subclass plist's class to have these as methods, or even to override __getitem__ and __setitem__, which would be convenient.
I was looking for a way to create multiple ad-hoc copies of a dictionary to hold some "evolutionary states", with just slight generational deviations and found this little prototype-dict:
class ptdict(dict):
def asprototype(self, arg=None, **kwargs):
clone = self.__class__(self.copy())
if isinstance(arg, (dict, ptdict)):
clone.update(arg)
clone.update(self.__class__(kwargs))
return clone
Basically i want smth. like:
generation0 = dict(propertyA="X", propertyB="Y")
generations = [generation0]
while not endofevolution():
# prev. generation = template for next generation:
nextgen = generations[-1].mutate(propertyB="Z", propertyC="NEW")
generations.append(nextgen)
and so on.
I was wondering, if the author of this class and me were missing something, because i just can't imagine, that there's no standard-library approach for this. But neither the collections nor the itertools seemed to provide a similar simple approach.
Can something like this be accomplished with itertools.tee?
Update:
It's not a question of copy & update, because, that's exactly what this ptdict is doing. But using update doesn't return a dict, which ptdict does, so i can for example chain results or do in-place tests, which would enhance readability quite a bit. (My provided example is maybe a bit to trivial, but i didn't want to confuse with big matrices.)
I apologise for not having been precise enough. Maybe the following example clarifies why i'm interested in getting a dictionary with a single copy/update-step:
nextgen = nextgen.mutate(inject_mutagen("A")) if nextgen.mutate(inject_mutagen("A")).get("alive") else nextgen.mutate(inject_mutagen("C"))
I guess you're looking for something like this:
first = {'x':1, 'y':100, 'foo':'bar'}
second = dict(first, x=2, y=200) # {'y': 200, 'x': 2, 'foo': 'bar'}
See dict
You can do it right away without custom types. Just use dict and instead of:
nextgen = generations[-1].mutate(propertyB="Z", propertyC="NEW")
do something like this:
nextgen = generations[-1].copy() # "clone" previous generation
nextgen.update(propertyB="Z", propertyC="NEW") # update properties of this gen.
and this should be enough, if you do not have nested dictionaries and do not need deep copy instead of simple copy.
The copy module contains functions for shallow and deep copying.
I have a function to port from another language, could you please help me make it "pythonic"?
Here the function ported in a "non-pythonic" way (this is a bit of an artificial example - every task is associated with a project or "None", we need a list of distinct projects, distinct meaning no duplication of the .identifier property, starting from a list of tasks):
#staticmethod
def get_projects_of_tasks(task_list):
projects = []
project_identifiers_seen = {}
for task in task_list:
project = task.project
if project is None:
continue
project_identifier = project.identifier
if project_identifiers_seen.has_key(project_identifier):
continue
project_identifiers_seen[project_identifier] = True
projects.append(project)
return projects
I have specifically not even started to make it "pythonic" not to start off on the wrong foot (e.g. list comprehension with "if project.identifier is not None, filter() based on predicate that looks up the dictionary-based registry of identifiers, using set() to strip duplicates, etc.)
EDIT:
Based on the feedback, I have this:
#staticmethod
def get_projects_of_tasks(task_list):
projects = []
project_identifiers_seen = set()
for task in task_list:
project = task.project
if project is None:
continue
project_identifier = project.identifier
if project_identifier in project_identifiers_seen:
continue
project_identifiers_seen.add(project_identifier)
projects.append(project)
return projects
There's nothing massively unPythonic about this code. A couple of possible improvements:
project_identifiers_seen could be a set, rather than a dictionary.
foo.has_key(bar) is better spelled bar in foo
I'm suspicious that this is a staticmethod of a class. Usually there's no need for a class in Python unless you're actually doing data encapsulation. If this is just a normal function, make it a module-level one.
What about:
project_list = {task.project.identifier:task.project for task in task_list if task.project is not None}
return project_list.values()
For 2.6- use dict constructor instead:
return dict((x.project.id, x.project) for x in task_list if x.project).values()
def get_projects_of_tasks(task_list):
seen = set()
return [seen.add(task.project.identifier) or task.project #add is always None
for task in task_list if
task.project is not None and task.project.identifier not in seen]
This works because (a) add returns None (and or returns the value of the last expression evaluated) and (b) the mapping clause (the first clause) is only executed if the if clause is True.
There is no reason that it has to be in a list comprehension - you could just as well set it out as a loop, and indeed you may prefer to. This way has the advantage that it is clear that you are just building a list, and what is supposed to be in it.
I've not used staticmethod because there is rarely a need for it. Either have this as a module-level function, or a classmethod.
An alternative is a generator (thanks to #delnan for point this out):
def get_projects_of_tasks(task_list):
seen = set()
for task in task_list:
if task.project is not None and task.project.identifier not in seen:
identifier = task.project.identifier
seen.add(identifier)
yield task.project
This eliminates the need for a side-effect in comprehension (which is controversial), but keeps clear what is being collected.
For the sake of avoiding another if/continue construction, I have left in two accesses to task.project.identifier. This could be conveniently eliminated with the use of a promise library.
This version uses promises to avoid repeated access to task.project.identifier without the need to include an if/continue:
from peak.util.proxies import LazyProxy, get_cache # pip install ProxyTypes
def get_projects_of_tasks(task_list):
seen = set()
for task in task_list:
identifier = LazyProxy(lambda:task.project.identifier) # a transparent promise
if task.project is not None and identifier not in seen:
seen.add(identifier)
yield task.project
This is safe from AttributeErrors because task.project.identifier is never accessed before task.project is checked.
Some say EAFP is pythonic, so:
#staticmethod
def get_projects_of_tasks(task_list):
projects = {}
for task in task_list:
try:
if not task.project.identifier in projects:
projects[task.project.identifier] = task.project
except AttributeError:
pass
return projects.values()
of cours an explicit check wouldn't be wrong either, and would of course be better if many tasks have not project.
And just one dict to keep track of seen identifiers and projects would be enough, if the order of the projects matters, then a OrderedDict (python2.7+) could come in handy.
There are already a lot of good answers, and, indeed, you have accepted one! But I thought I would add one more option. A number of people have seen that your code could be made more compact with generator expressions or list comprehensions. I'm going to suggest a hybrid style that uses generator expressions to do the initial filtering, while maintaining your for loop in the final filter.
The advantage of this style over the style of your original code is that it simplifies the flow of control by eliminating continue statements. The advantage of this style over a single list comprehension is that it avoids multiple accesses to task.project.identifier in a natural way. It also handles mutable state (the seen set) transparently, which I think is important.
def get_projects_of_tasks(task_list):
projects = (task.project for task in task_list)
ids_projects = ((p.identifier, p) for p in projects if p is not None)
seen = set()
unique_projects = []
for id, p in ids_projects:
if id not in seen:
seen.add(id)
unique_projects.append(p)
return unique_projects
Because these are generator expressions (enclosed in parenthesis instead of brackets), they don't build temporary lists. The first generator expression creates an iterable of projects; you could think of it as performing the project = task.project line from your original code on all the projects at once. The second generator expression creates an iterable of (project_id, project) tuples. The if clause at the end filters out the None values; (p.identifier, p) is only evaluated if p passes through the filter. Together, these two generator expressions do away with your first two if blocks. The remaining code is essentially the same as yours.
Note also the excellent suggestion from Marcin/delnan that you create a generator using yield. This cuts down further on the verbosity of your code, boiling it down to its essentials:
def get_projects_of_tasks(task_list):
projects = (task.project for task in task_list)
ids_projects = ((p.identifier, p) for p in projects if p is not None)
seen = set()
for id, p in ids_projects:
if id not in seen:
seen.add(id)
yield p
The only disadvantage -- in case this isn't obvious -- is that if you want to permanently store the projects, you have to pass the resulting iterable to list.
projects_of_tasks = list(get_projects_of_tasks(task_list))
I was writing some Python 3.2 code and this question came to me:
I've got these variables:
# a list of xml.dom nodes (this is just an example!)
child_nodes = [node1, node2, node3]
# I want to add every item in child_node into this node (this also a xml.dom Node)
parent = xml_document.createElement('TheParentNode')
This is exactly what I want to do:
for node in child_nodes:
if node is not None:
parent.appendChild(node)
I wrote it in one line like this:
[parent.appendChild(c) for c in child_nodes if c is not None]
I'm not going to use the list result at all, I only need the appendChild to do its work.
I'm not a very experienced python programmer, so I wonder which one is better?
I like the single line solution, but I would like to know from experienced python programmers:
Which one is better, in a context of either code beauty/maintainability) and performance/memory use.
The former is preferable in this situation.
The latter is called a list comprehension and creates a new list object full of the results of each call to parent.appendChild(c). And then discards it.
However, if you want to make a list based on this kind of iteration, then you should certainly employ a list comprehension.
The question of code beauty/maintainability is a tricky one. It's really up to you and whoever you work with to decide.
For a long time I was uncomfortable with list comprehensions and so on and preferred writing it the first way because it was easier for me to read. Some people I work with, however, prefer the second method.