How many private variables is too much? - python

I'm currently working on my first larger scale piece of software, and am running into an ugly situation. Most features I have added thus far have required an additional private member in order to function properly. This is due to the fact that most features have been giving more power to the user, by allowing them to modify my program through either arguments passed to the constructor, or methods that specify a setting they wish to toggle.
I currently have around 13 private variables, and can see this spiraling out of control. The constructor code is starting to look very ugly. I was wondering if this is just a result of adding features, or if there was a creative/clever way to avoid this issue.

I would recommend abstracting the concept of of "behavior"
You'd have a base class "behavior" which actually performs the requested action, or manages the modification to behavior. Then you can initialize your code using an array of "parameters" and "behaviors".
Your startup code would become a simple "for" loop, and to add/remove behaviors you just add or remove to the list.
Of course, the tough part of this is actually fitting the activities of the behavior classes into your overall program flow. But I'm guessing that a focus on "single responsibility principle" would help figure that out.

Related

Functions for adding nested data to model, inside or outside the model?

suppose I have a class/model building that has a relation to the class/model wall and this again to the class/model window in a form that one building can have many surfaces and a surface can have many windows (one to many).
Now when I want to add windows to that building, maybe also only to certain surfaces, should the functions(also search functions/loops) be written inside the model? Or outside in a separate class/script that is either called from the model or called from outside?
I could imagine, when the functionality is part of the model, that it could cause problems when changes are needed in the long run.
What is the cleaner architecture/standard since both could work?
If possible can you give me a source to read more into this certain problem?
In my case I'm using python with sqlalchemy and postgres, but this question could also be legitimate for other programming languages.
(I hope this question is not too broad/ opinion based)
For starters, I think this question might have been better asked in Softwareengineering. However, I might as well give you my few cents on this.
As so often, it depends ...
Generally, encapsulation is one of the core concepts in object-oriented programming.
Any change to the state of an object should be done by the object itself (although potentially triggered externally) and therefore be guaranteed to comply with the terms and conditions you defined for your object. The behavior of your object should be implemented inside your object not outside of it.
You don't want to expose your Window's attribute wall publicly for all the world to access it directly. You want to hide it behind getters and setters. You want the Window to refuse being placed on a Wall that is passed to its wall setter if said Wall happens to be 'interior'. You don't want a Person object to change the Window's state from 'open' to 'close' and vice versa directly, you want the Person to call the Window's open() resp. close() method, e.g. to ensure internally that a closed window is not closed again.
Also, hiding implementation details can help maintaining your interface and making changes to your class transparent. Say, for example, you decide that, in addition to disallow interior walls, you now also want to prevent "normal" windows from being put into exterior walls in the basement. You can implement that check into your existing wall setter in Window and the only visible change for external code would be another potential reason for refusal ("window=normal and wall=basement" in addition to "wall=interior"). Or you want to add an attribute representing the state of cleanliness of your Window and, to make a proper distinction between the new cleanliness_state and the old 'open'/'close' state, you want to rename the old attribute to open_close_state. With your methods open(), close() (and potentially is_open() and is_closed()) reading from and writing to your 'open'/'close' state attribute, this change just affects your class implementation, not every piece of code that uses it.
However!
You might have classes that just work as some sort of collection, i.e. data classes. These implement little to no functionality and publicly expose their attributes to be read and written by the whole world, thus broadly ignoring the concept of encapsulation. One could argue that classes/models implemented in an object-relational mapping layer, such as SQLAlchemy, are more of a data object/data class than an object in the OOP sense, especially when used mainly to persist and retrieve structured data. It is not unusual to have external code change the state of such an object or implement its functionality, like the views in the Django framework that uses its own ORM layer to implement and persist models.
So?
It boils down to your concrete case. You already mentioned that you consider restricting the placement of windows; probably based on properties of the windows and walls involved.
If you consider your SQLAlchemy models more than just a way of persisting your objects, go ahead and implement the behavior and change logic right away in your model. But keep in mind that a) you might end up creating conflicts with methods/properties of your model's base class and b) the attributes of your models must remain publicly exposed to maintain the functionality of your ORM layer (Although SQLAlchemy might be able to work with properties as long as both, getter and setter are defined; I have never tested that).
If you want the models to be a rather convenient method of persisting and retrieving your structured data, keep them clean and go for some utility functions or classes that implement your object's behavior and ensure its contract when being used in the code; e.g. have a function place_window_on_wall(window: Window, wall: Wall) that takes care of validation and restrictions when you try to reference a Wall object on your Window's wall attribute. But keep in mind that changes to your model must be reflected in these functions / classes as well.
I consider both options valid; just whatever you opt for, be consistent with your decision.

Designing an OOP solution

Im writing code for research purposes, in which I search through a bulk of files and rank them according to their relevance. I call the entire process quickSearching, and it is composed of two serial stages - first I search the file and retrieve a list a candidates files, then I score those candidates and rank them.
So a quicksearch is simply a serial combination of a search method and a score method.
Im planning to implement various searching and scoring methodologies, and I would like to test all possible combinations and evaluate them to see which is the winning combo.
Since the number of combos will grow very fast, It is important to write the code in a good structure and design. I thought about the following designs (Im writing the code in python):
A quickSearcher class that will receive pointers to a searcher and scorer functions
A quickSearcher class that will receive a searcher object and a scorer object
A quickSearcher calss that will inherit from a searcher class and an scorer class
since Im basically an EE engineer, Im not sure how to select between the options and if this a common problem in CS with trivial pattern design.The design i'm looking will hopefully:
Be very code-volume efficient, since some of the searching and scoring methods differ in simply a different value of a parameter or two.
Be very modular and logical errors prone.
Be easy to navigate through
Any other consideration I should take?
This is my first design question so it might not be valid or missing important info, please notify me if it is.
Classes are often overused, especially by programmers coming from languages like Java and C# where they are compulsory. I recommend watching the presentation Stop Writing Classes.
When deciding whether to create a class it is useful to ask yourself the following questions:
1) Will the class need to have multiple methods?
If the class only has a single method (apart from __init__) then you may as well make it a function instead. If it needs to preserve state between calls then use a generator. If it needs to be created in one place with some parameters then called elsewhere you can use a closure (a function that returns another function) or functools.partial.
2) will it need to share state between methods?
If the class does not need to share state between methods then it may be better replaced with either a set of independent functions or smaller classes (or some combination).
If the answer to both questions is yes then go ahead and create a class.
For your example I think option 1 is the way to go. The searcher and scorer objects sound like they if they are classes they will only have a single method, probably called something like execute or run. Make them functions instead.
Depending on your use case, quickSorter itself may be better off as a function or generator as well, so no need for any classes at all.
BTW there is no distinction in Python between a function and a pointer to a function.

Understand programmatically a python code without executing it

I am implementing a workflow management system, where the workflow developer overloads a little process function and inherits from a Workflow class. The class offers a method named add_component in order to add a component to the workflow (a component is the execution of a software or can be more complex).
My Workflow class in order to display status needs to know what components have been added to the workflow. To do so I tried 2 things:
execute the process function 2 times, the first time allow to gather all components required, the second one is for the real execution. The problem is, if the workflow developer do something else than adding components (add element in a databases, create a file) this will be done twice!
parse the python code of the function to extract only the add_component lines, this works but if some components are in a if / else statement and the component should not be executed, the component apears in the monitoring!
I'm wondering if there is other solution (I thought about making my workflow being an XML or something to parse easier but this is less flexible).
You cannot know what a program does without "executing" it (could be in some context where you mock things you don't want to be modified but it look like shooting at a moving target).
If you do a handmade parsing there will always be some issues you miss.
You should break the code in two functions :
a first one where the code can only add_component(s) without any side
effects, but with the possibility to run real code to check the
environment etc. to know which components to add.
a second one that
can have side effects and rely on the added components.
Using an XML (or any static format) is similar except :
you are certain there are no side effects (don't need to rely on the programmer respecting the documentation)
much less flexibility but be sure you need it.

Dynamically broadcast configuration changes in python twisted

I am about to refactor the code of a python project built on top of twisted. So far I have been using a simple settings.py module to store constants and dictionaries like:
#settings.py
MY_CONSTANT='whatever'
A_SLIGHTLY_COMPLEX_CONF= {'param_a':'a', 'param_b':b}
A great deal of modules import settings.py to do their stuff.
The reason why I want to refactor the project is because I am in need to change/add configuration parameters on the fly. The approach that I am about to take is to gather all configuration in a singleton and to access its instance whenever I need to.
import settings.MyBloatedConfig
def first_insteresting_function():
cfg = MyBloatedConfig.get_instance()
a_much_needed_param = cfg["a_respectable_key"]
#do stuff
#several thousands of functions later
def gazillionth_function_in_module():
tired_cfg = MyBloatedConfig.get_instance()
a_frustrated_value = cfg["another_respectable_key"]
#do other stuff
This approach works but feels unpythonic and bloated. An alternative would be to externalize the cfg object in the module, like this:
CONFIG=MyBloatedConfig.get_instance()
def a_suspiciously_slimmer_function():
suspicious_value = CONFIG["a_shady_parameter_key"]
Unfortunately this does not work if I am changing the MyBloatedConfig instance entries in another module. Since I am using the reactor pattern, storing staff on a thread local is out of question as well as using a queue.
For completeness, following is the implementation I am using to implement a singleton pattern
instances = {}
def singleton(cls):
""" Use class as singleton. """
global instances
#wraps(cls)
def get_instance(*args, **kwargs):
if cls not in instances:
instances[cls] = cls(*args, **kwargs)
return instances[cls]
return get_instance
#singleton
class MyBloatedConfig(dict):
....
Is there some other more pythonic way to broadcast configuration changes across different modules?
The big, global (often singleton) config object is an anti-pattern.
Whether you have settings.py, a singleton in the style of MyBloatedConfig.get_instance(), or any of the other approaches you've outlined here, you're basically using the same anti-pattern. The exact spelling doesn't matter, these are all just ways to have a true global (as distinct from a Python module level global) shared by all of the code in your entire project.
This is an anti-pattern for a number of reasons:
It makes your code difficult to unit test. Any code that changes its behavior based on this global is going to require some kind of hacking - often monkey-patching - in order to let you unit test its behavior under different configurations. Compare this to code which is instead written to accept arguments (as in, function arguments) and alters its behavior based on the values passed to it.
It makes your code less re-usable. Since the configuration is global, you'll have to jump through hoops if you ever want to use any of the code that relies on that configuration object under two different configurations. Your singleton can only represent one configuration. So instead you'll have to swap global state back and forth to get the different behavior you want.
It makes your code harder to understand. If you look at a piece of code that uses the global configuration and you want to know how it works, you'll have to go look at the configuration. Much worse than this, though, is if you want to change your configuration you'll have to look through your entire codebase to find any code that this might affect. This leads to the configuration growing over time, as you add new items to it and only infrequently remove or modify old ones, for fear of breaking something (or for lack of time to properly track down all users of the old item).
The above problems should hint to you what the solution is. If you have a function that needs to know the value of some constant, make it accept that value as an argument. If you have a function that needs a lot of values, then create a class that can wrap up those values in a convenient container and pass an instance of that class to the function.
The part of this solution that often bothers people is the part where they don't want to spend the time typing out all of this argument passing. Whereas before you had functions that might have taken one or two (or even zero) arguments, now you'll have functions that might need to take three or four arguments. And if you're converting an application written in the style of settings.py, then you may find that some of your functions used half a dozen or more items from your global configuration, and these functions suddenly have a really long signature.
I won't dispute that this is a potential issue, but should be looked upon mostly as an issue with the structure and organization of the existing code. The functions that end up with grossly long signatures depended on all of that data before. The fact was just obscured from you. And as with most programming patterns which hide aspects of your program from you, this is a bad thing. Once you are passing all of these values around explicitly, you'll see where your abstractions need work. Maybe that 10 parameter function is doing too much, and would work better as three different functions. Or maybe you'll notice that half of those parameters are actually related and always belong together as part of a container object. Perhaps you can even put some logic related to manipulation of those parameters onto that container object.

Design of a multi-level abstraction software

I'm working on designing a piece of software now, that has a few levels of abstraction. This might be the most complex piece of code I've ever started designing, and it has a requirement for easy upgrading, so I'm wanting to make sure I'm on the right track before I even start coding anything.
Essentially, there will be 3 main levels of classes. These two classes will need to talk with each other.
The first is the input source data. There are currently 2 main types of input data, which produce similar, but not identical output. The main goal of these classes will be to get the data from the two difference sources and convert it into a common interface, for use in the rest of the program.
The second set will be an adapter for an external library. The library has been periodically updated, and I have no reason to suspect that it will not continue to be updated throughout the years. Most likely, each upgrade will remain very similar to the previous one, but there might be some small changes made to support a new library version. This level will be responsible for taking the inputs, and formatting them for a use of an output class.
The last class is the outputs. I don't think that multiple versions will be required for this, but there will need to be at least two different output directories specified. I suspect the easiest thing to do would be to simply pass in an output directory when the output class is created, and that is the only level of abstraction required. This class will be frequently updated, but there is no requirement to support multiple versions.
Set up the code as follows, essentially following a bridge pattern, but with multiple abstraction layers.
The input class will be the abstraction. The currently two different means of getting output will be the two different concrete classes, and more concrete classes can be added if required.
The wrapper class will be a factory pattern. Most of the code should be common between the various implementations, so this should work well to handle minute differences.
The output class will be included as a part of the implementor class. There isn't a pattern really required, as only one version will ever be required for this class. Also, the implementor will likely be a singleton.

Categories