known dictionary structure, but very nested - alternative to ugly code

known dictionary structure, but very nested - alternative to ugly code - python

Using plistlib to load a plist file in Python, I have a data structure to work with wherein a given path to a key-value pair should never fail, so it's acceptable IMO to hard-code the path without .get() and other tricks -- however, it is a long and ugly path. The plist is full of dicts in arrays in dicts, so it ends up looking like this:
def input_user_data(plist, user_text):
new_text = clean_user_data(user_data)
plist['template_data_array'][0]['template_section']['section_fields_data'][0]['disclaimer_text'] = new_text #do not like
Apart from being way past the 79 character limit, it just looks lugubrious and sophomoric. However it seems equally silly to step through it like this:
#....
one = plist['template_data_array']
two = one[0]['template_section']['section_fields_data']
two[0]['disclaimer_text'] = new_text
...because I don't really need all those assignments, I'm just looking to sanitize the user text and toss it into the predefined section of a plist.
When dealing with a nested path that will always exist but is just tedious to access (and may indeed need to be found again by other methods), is there a shorter technique to employ, or do I just grin and bear the lousy nested structure that I have no control over?

When you see a lot of duplicated or boilerplate code, this is often a hint that you can refactor the repetitive operations into a function. Writing get_node and set_node helper functions not only makes the code that sets the values simpler, it allows you to easily define the paths as constants, which you can put all in one place in your code for easier maintenance.
def get_node(container, path):
for node in path:
container = container[node]
return container
def set_node(container, path, value):
container = get_node(container, path[:-1])
container[path[-1]] = value
DISCLAIMER_PATH = ("template_data_array", 0, "template_section", "section_fields_data",
0, "disclaimer_text")
set_node(plist, DISCLAIMER_PATH, new_text)
Potentially, you could subclass plist's class to have these as methods, or even to override __getitem__ and __setitem__, which would be convenient.

Related

How to respect PEP8 when accessing multiple nested dictionaries?

I have a line of code like this:
mydict['description_long'] = another_dict['key1'][0]['a_really_long_key']['another_long_key']['another_long_key3']['another_long_key4']['another_long_key5']
How do I format it so it adheres to the PEP8 guidelines?

The only relevant part of PEP8's style guidelines here is line length. Just break up the dict keys into their own separate lines. This makes the code way easier to read as well.
mydict['description_long'] = (another_dict['key1']
[0]
['a_really_long_key']
[etc.])

I think I'd do something like this, add parens to go over multiple lines:
mydict['description_long'] = (
another_dict['key1'][0]['a_really_long_key']['another_long_key']
['another_long_key3']['another_long_key4']['another_long_key5'])
Though it'd be better not to have such a deep structure in the first place, or to split up the lookup into several, if you can give those good names:
item = another_dict['key1'][0]['a_really_long_key']
part_name = item['another_long_key']['another_long_key3']
detail = part_name['another_long_key4']['another_long_key5']
At least that way the deep structure is documented a little.

each [ is a bracket. So it nominally just like nesting parenthesis:
mydict['description_long'] = another_dict['key1'][0][
'a_really_long_key']['another_long_key'][
'another_long_key3']['another_long_key4'][
'another_long_key4']
A more generic way might be to just do some metaprogramming and use a series of list comprehensions or iteration to expand child datastructures. For example, your child node can be found by following a path represented by the list:
keypath = ['key1', 0, 'a_really_long_key', 'another_long_key',
'another_long_key3','another_long_key4',
'another_long_key4']
so you reference your final node by something like:
def resolve_child(root, path):
for e in path:
child = root[e]
root = child
return child
mydict['description_long'] = resolve_path(another_dict, keypath)
Or if you want to be all functional (Note that reduce() is moved to functools in Py3K):
mydict['description_long'] = reduce(lambda p,c: p[c], keypath, another_dict)
It is usually rare that you have to explicitly reference a deeply nested structure like that; usually the structure is being instantiated by some function, like json.parse or lxml.objectify

What is a Pythonic way of using flags to (de)activate features in a method?

There are bits of code that I'd like to customize. Say I want to assign a bunch of student applications to a summer program to various readers (so 100 apps, 3 readers, divide them roughly evenly, etc). In some cases, I want to take reader preferences into consideration (I only want to read applications from students in California, etc.). In other cases, I don't care who they are assigned to. Right now I've got something that looks roughly like this:
def assign_readers(preferences_flag):
if preferences_flag:
assign_readers_with_preferences()
assign_remaining
I've got multiple cases of similar features throughout my code that I would like to easily turn on/off, but it doesn't look like a necessarily clean way of doing it. The same flag is sometimes used in other parts of the code, so I'm passing around these flags left and right. For example:
def log_reader_stats(preferences_flag, other_flag):
if preferences_flag:
log_reader_stats_with_preferences()
if other_flag:
log_readers_stats_with_other_stuff()
log_remaining_stats
What is an alternative way of doing this? Passing flags around seems repetitive and inefficient, but other than this I'm not sure how I can "toggle" such features on and off.
Below is an example of how some of the actual code being used, and how the flags come into play.
USE_PREF = True
USE_SPEC_GRP = True
def main():
# Load and store config file information
fnames = {}
snames = {}
options = read_config_file()
validate_config_params(options, fnames, snames)
# Load the applications file
apps = pio.file_to_frame(fnames["apps"], snames["apps"])
# load target and max number of apps each reader can handle.
rdr_counts = pio.file_to_frame(fnames["rdr_counts"], snames["rdr_counts"])
# Assign applications depending on which options are enabled
if USE_SPEC_GRP:
assign_all_special_groups(apps, fnames["spec_grp"], snames["spec_grp"])
if USE_PREF:
assign_apps_with_prefs(apps, rdr_counts,
fnames["rdr_prefs"], snames["rdr_prefs"])
assign_remaining_apps(apps, rdr_counts, fnames, snames)

Although you didn't ask this, there was a code smell that warrants explanation. Whenever you find yourself using parallel data sources like fnames and snames in:
assign_all_special_groups(apps, fnames["spec_grp"], snames["spec_grp"])
you are usually making error prone code. Instead you could have
names['spec_grp'] = ('something', 'anotherthing')
which ensures that the elements of spec_grp always remain associated with each other. There exists a namedtuple type which makes access very readable:
names['spec_grp'].f_thing
names['spec_grp'].s_thing
but without getting into that slight complexity you'll need to access them with
names['spec_grp'][0]
names['spec_grp'][1]
If I am reading your intent properly, the code above could combine these values with the option flags so that
options['spec_grp'] = (fname_for_spec_grp, sname_for_spec_group)
if options['spec_grp']:
assign_all_special_groups(apps, options["spec_grp"][0], options["spec_grp"][1])
This makes initializing configuration elements that have no value to None important but it is also good practice.
But didn't I just make your calling code longer and harder to read? Kinda. Did it buy you flexibility, maintainability, and safety for a few extra characters? Yep. And it does turn three data structures (options, fnames, snames) into one dictionary which signals if an option is desired and if so, what its arguments are.

You can simply create a class ReadersManager with a property flags, make the functions methods of that class and access self.flags inside them.

The best way is to look at how others have done it, in this case in the ConfigParser standard module. This uses dictionaries to store and retrieve configuration data (well actually it uses dictionaries of dictionaries but we needn't). The key point is that a dictionary can associate a name with most everything, and using data to describe the location of data is far better than hardcoding it.
In your case the dictionary
options = {
'USE_SPEC_GROUP': False,
'USE_PREF': False,
}
but it is a dictionary so I can add to it as needed
options['available'] = False
or even do bulk initialization easily:
options = {}
for option in "car plane train boat".split():
options[option] = False
of course accessing them in conditionals is easy
if options['boat']:
# do boat things
And now you have one variable to pass around that contains all the configuration data:
some_function(options)
No need to use a class when a fundamental type like dict is so useful on its own.

Use python dict to lookup mutable objects

I have a bunch of File objects, and a bunch of Folder objects. Each folder has a list of files. Now, sometimes I'd like to lookup which folder a certain file is in. I don't want to traverse over all folders and files, so I create a lookup dict file -> folder.
folder = Folder()
myfile = File()
folder_lookup = {}
# This is pseudocode, I don't actually reach into the Folder
# object, but have an appropriate method
folder.files.append(myfile)
folder_lookup[myfile] = folder
Now, the problem is, the files are mutable objects. My application is built around the fact. I change properites on them, and the GUI is notified and updated accordingly. Of course you can't put mutable objects in dicts. So what I tried first is to generate a hash based on the current content, basically:
def __hash__(self):
return hash((self.title, ...))
This didn't work of course, because when the object's contents changed its hash (and thus its identity) changed, and everything got messed up. What I need is an object that keeps its identity, although its contents change. I tried various things, like making __hash__ return id(self), overriding __eq__, and so on, but never found a satisfying solution. One complication is that the whole construction should be pickelable, so that means I'd have to store id on creation, since it could change when pickling, I guess.
So I basically want to use the identity of an object (not its state) to quickly look up data related to the object. I've actually found a really nice pythonic workaround for my problem, which I might post shortly, but I'd like to see if someone else comes up with a solution.

I felt dirty writing this. Just put folder as an attribute on the file.
class dodgy(list):
def __init__(self, title):
self.title = title
super(list, self).__init__()
self.store = type("store", (object,), {"blanket" : self})
def __hash__(self):
return hash(self.store)
innocent_d = {}
dodge_1 = dodgy("dodge_1")
dodge_2 = dodgy("dodge_2")
innocent_d[dodge_1] = dodge_1.title
innocent_d[dodge_2] = dodge_2.title
print innocent_d[dodge_1]
dodge_1.extend(range(5))
dodge_1.title = "oh no"
print innocent_d[dodge_1]

OK, everybody noticed the extremely obvious workaround (that took my some days to come up with), just put an attribute on File that tells you which folder it is in. (Don't worry, that is also what I did.)
But, it turns out that I was working under wrong assumptions. You are not supposed to use mutable objects as keys, but that doesn't mean you can't (diabolic laughter)! The default implementation of __hash__ returns a unique value, probably derived from the object's address, that remains constant in time. And the default __eq__ follows the same notion of object identity.
So you can put mutable objects in a dict, and they work as expected (if you expect equality based on instance, not on value).
See also: I'm able to use a mutable object as a dictionary key in python. Is this not disallowed?
I was having problems because I was pickling/unpickling the objects, which of course changed the hashes. One could generate a unique ID in the constructor, and use that for equality and deriving a hash to overcome this.
(For the curious, as to why such a "lookup based on instance identity" dict might be neccessary: I've been experimenting with a kind of "object database". You have pure python objects, put them in lists/containers, and can define indexes on attributes for faster lookup, complex queries and so on. For foreign keys (1:n relationships) I can just use containers, but for the backlink I have to come up with something clever if I don't want to modify the objects on the n side.)

Disparate methods. Would like for class to be able to treat them all the same

This is a question about a clean, pythonic way to juggle some different instance methods.
I have a class that operates a little differently depending on certain inputs. The differences didn't seem big enough to justify producing entirely new classes. I have to interface the class with one of several data "providers". I thought I was being smart when I introduced a dictionary:
self.interface_tools={'TYPE_A':{ ... various ..., 'data_supplier':self.current_data},
'TYPE_B':{ ... various ..., 'data_supplier':self.predicted_data} }
Then, as part of the class initialization, I have an input "source_name" and I do ...
# ... various ....
self.data_supplier = self.interface_tools[source_name]['data_supplier']
self.current_data and self.predicted_data need the same input parameter, so when it comes time to call the method, I don't have to distinguish them. I can just call
new_data = self.data_supplier(param1)
But now I need to interface with a new data source -- call it "TYPE_C" -- and it needs more input parameters. There are ways to do this, but nothing I can think of is very clean. For instance, I could just add the new parameters to the old data_suppliers and never use them, so then the call would look like
new_data = self.data_supplier(param1,param2,param3)
But I don't like that. I could add an if block
if self.data_source != 'TYPE_C':
new_data = self.data_supplie(param1)
else:
new_data = self.data_c_supplier(param1,param2,param3)
but avoiding if blocks like this was exactly what I was trying to do in the first place with that dictionary I came up with.
So the upshot is: I have a few "data_supplier" routines. Now that my project has expanded, they have different input lists. But I want my class to be able to treat them all the same to the extent possible. Any ideas? Thanks.

Sounds like your functions could be making use of variable length argument lists.
That said, you could also just make subclasses. They're fairly cheap to make, and would solve your problem here. This is pretty much the case they were designed for.

You could make all your data_suppliers accept a single argument and make it a dictionary or a list or even a NamedTuple.

Best way to store and use a large text-file in python

I'm creating a networked server for a boggle-clone I wrote in python, which accepts users, solves the boards, and scores the player input. The dictionary file I'm using is 1.8MB (the ENABLE2K dictionary), and I need it to be available to several game solver classes. Right now, I have it so that each class iterates through the file line-by-line and generates a hash table(associative array), but the more solver classes I instantiate, the more memory it takes up.
What I would like to do is import the dictionary file once and pass it to each solver instance as they need it. But what is the best way to do this? Should I import the dictionary in the global space, then access it in the solver class as globals()['dictionary']? Or should I import the dictionary then pass it as an argument to the class constructor? Is one of these better than the other? Is there a third option?

If you create a dictionary.py module, containing code which reads the file and builds a dictionary, this code will only be executed the first time it is imported. Further imports will return a reference to the existing module instance. As such, your classes can:
import dictionary
dictionary.words[whatever]
where dictionary.py has:
words = {}
# read file and add to 'words'

Even though it is essentially a singleton at this point, the usual arguments against globals apply. For a pythonic singleton-substitute, look up the "borg" object.
That's really the only difference. Once the dictionary object is created, you are only binding new references as you pass it along unless if you explicitly perform a deep copy. It makes sense that it is centrally constructed once and only once so long as each solver instance does not require a private copy for modification.

Adam, remember that in Python when you say:
a = read_dict_from_file()
b = a
... you are not actually copying a, and thus using more memory, you are merely making b another reference to the same object.
So basically any of the solutions you propose will be far better in terms of memory usage. Basically, read in the dictionary once and then hang on to a reference to that. Whether you do it with a global variable, or pass it to each instance, or something else, you'll be referencing the same object and not duplicating it.
Which one is most Pythonic? That's a whole 'nother can of worms, but here's what I would do personally:
def main(args):
run_initialization_stuff()
dictionary = read_dictionary_from_file()
solvers = [ Solver(class=x, dictionary=dictionary) for x in len(number_of_solvers) ]
HTH.

Depending on what your dict contains, you may be interested in the 'shelve' or 'anydbm' modules. They give you dict-like interfaces (just strings as keys and items for 'anydbm', and strings as keys and any python object as item for 'shelve') but the data is actually in a DBM file (gdbm, ndbm, dbhash, bsddb, depending on what's available on the platform.) You probably still want to share the actual database between classes as you are asking for, but it would avoid the parsing-the-textfile step as well as the keeping-it-all-in-memory bit.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

known dictionary structure, but very nested - alternative to ugly code - python

Related

How to respect PEP8 when accessing multiple nested dictionaries?

What is a Pythonic way of using flags to (de)activate features in a method?

Use python dict to lookup mutable objects

Disparate methods. Would like for class to be able to treat them all the same

Best way to store and use a large text-file in python

Categories

Resources