Using class as functions and "global" variables container: bad design? - python

I spent last months rewriting from scratch a new version of my Python algorithm. One of my goals was to write a perfectly documented code, easy to read and understand for "anyone".
In the same project folder I put a lot of different modules and each module contain a class. I used classes as functions and related variables container, in that way a class contain all the functions with a specific task, for example wrinting on Excel files all the output results of the algorithm.
Here an example:
Algorithm.py
import os
import pandas as pd
import numpy as np
from Observer import Observer
def main(hdf_path):
for hdf_file in os.listdir(hdf_path):
filename = str(hdf_file.replace('.hdf', '.xlsx'))
Observer.create_workbook(filename)
dataframe = pd.read_hdf(hdf_file)
years_array = dataframe.index.levels[0].values
for year in years_array:
year_mean = np.mean(dataframe.loc[year].values)
Observer.mean_values = np.append(Observer.mean_values, dataframe_mean)
Observer.export_result()
if __name__ == "main":
hdf_path = 'bla/bla/bla/'
main(hdf_path)
Observer.py
import numpy as np
import openpyxl
class Observer:
workbook = None
workbookname = None
mean_values = np.array([])
def create_workbook(filename):
Observer.workbook = openpyxl.Workbook()
Observer.workbookname = filename
# do other things
def save_workbook():
Observer.workbook.save('results_path' + Observer.workbookname)
def export_results():
# print Observer.mean_values values in different workbook cells
# export result on a specific sheet
I hope that you can understand from this simple example how do I use class on my project. For every class I define a lot of variables (workbook for example) and I call them from other modules as if they were global variables. In that way I can easily access them from anywhere and I dont need to pass them to functions explicitly, cause I can simply write Classname.varname.
My question is: is it bad design? Will it create some problems or performance slowdown?
Thanks for your help.

My question is: is it bad design?
Yes.
I can simply write Classname.varname.
You are creating a very strong coupling between classes when you enforce calling Classname.varname. The class that access this variable is now strongly coupled with Classname. This prevent you from changing the behavior in OOP way by passing different parameters, and will complicate testing of the class - since you will be unable to mock Classname and use its mock instead of the "real" class.
This will result in code duplication when you try to run 2 pieces of very similar code in two parts of your app, which only vary in these parameters. You will end up creating two almost identical classes, one using Workbook and the other using Notepad classes.
And remember the vicious cycle:
Hard to test code -> Fear of refactor -> Sloppy code
^ |
| |
---------------------------------------
Using proper objects, with ability to mock them (and dependency injection) is going to guarantee your code is easily testable, and the rest will will follow.

Related

How should I avoid duplicate imports when writing a package? [duplicate]

This question already has answers here:
Should I hide imports of dependencies in a python module from the __dir__ method?
(1 answer)
How should I perform imports in a python module without polluting its namespace?
(6 answers)
Closed 1 year ago.
I'm making a python package to run analyses with pandas, and I use pandas objects in most files in the package. How do I import those functions so they're usable in the package but don't clutter the namespace for a user? Say I have this directory structure:
MyThing/
MyThing/
__init__.py
apis.py
MyClass.py
where MyClass.py provides a class I will instantiate to process data in memory and apis.py has interfaces to local and remote databases. As a demonstration, say __init__.py contains
from MyThing.MyClass import MyClass
from MyThing.apis import DBInterface
the contents of MyClass.py are
class MyClass:
def __init__():
pass
and apis.py is
import pandas as pd
class DBInterface:
def __init__():
pass
With complete code I expect the use case to look something like this
import MyThing as mt
# get some data
interface = mt.DBInterface()
some_data = interface.query(parameters)
# load it into MyThing
instance = mt.MyThing(some_data)
# add new data from another source
instance.read(filename)
# make some fancy products
instance.magic(parameters)
# update the database
interface.update_db(instance)
The concern I have is that dir(mt.apis) shows everything I've imported, meaning I can do things like make a pandas DataFrame with df = mt.apis.pd.DataFrame(). Is this how it's supposed to work? Should I be using import differently so the namespace isn't cluttered with dependencies? Should I design the package differently so the dependencies aren't available when I import MyThing?
What you are doing is fine and how it's supposed to work and I wouldn't advise trying hard to hide your pandas import.
The solution to this df = mt.apis.pd.DataFrame() is: don't do that.
If there is a function or variable within Mything.apis that you don't want others to use, you can prefix it with a single underscore (eg. _foo). By convention this is understood to be for "internal use" and is not imported when you do from Mything.apis import *. See this section of the PEP-8 style guide for more information about naming conventions of this sort.
If you'd like to be more explicit about what things your module exports you may define them like so __all__ = ['foo', 'bar']. This also makes it so that if you or someone does from Mything.apis import * (which is generally ill-advised anyway) they will only import foo and bar, but you should treat this as a mere suggestion, just like the leading underscore convention.

Is it bad practice to modify attributes of one module from another module?

I want to define a bunch of config variables that can be imported in all the modules in my project. The values of those variables will be constant during runtime but are not known before runtime; they depend on the input. Usually I'd define a dict in my top module which would be passed to all functions and classes from other modules; however, I was thinking it may be cleaner to simply create a blank config.py module which would be dynamically filled with config variables by the top module:
# top.py
import config
config.x = x
# config.py
x = None
# other.py
import config
print(config.x)
I like this approach because I don't have to save the parameters as attributes of classes in my other modules; which makes sense to me because parameters do not describe classes themselves.
This works but is it considered bad practice?
The question as such may be disputed. But I would generally say yes, it's "bad practice" because scope and impact of change is really getting blurred. Note the use case you're describing really is not about sharing configuration, but about different parts of the program functions, objects, modules exchanging data and as such it's a bit of a variation on (meta)global variable).
Reading common configuration values could be fine, but changing them along the way... you may lose track of what happened where and also in which order as modules get imported / values get modified. For instance assume the config.py and two modules m1.py:
import config
print(config.x)
config.x=1
and m2.py:
import config
print(config.x)
config.x=2
and a main.py that just does:
import m1
import m2
import config
print(config.x)
or:
import m2
import m1
import config
print(config.x)
The state in which you find config in each module and really any other (incl. main.py here) depends on order in which imports have occurred and who assigned what value when. Even for a program entirely under your control, this may get confusing (and source of mistakes) rather quickly.
For runtime data and passing information between objects and modules (and your example is really that and not configuration that is predefined and shared between modules) I would suggest you look into describing the information perhaps in a custom state (config) object and pass it around through appropriate interface. But really just a function / method argument may be all that is needed. The exact form depends on what exactly you're trying to achieve and what your overall design is.
In your example, other.py behaves differently when called or imported before top.py which may still seem obvious and manageable in a minimal example, but really is not a very sound design. Anyone reading the code (incl. future you) should be able to follow its logic and this IMO breaks its flow.
The most trivial (and procedural) example of what for what you've described and now I hopefully have a better grasp of would be other.py recreating your current behavior:
def do_stuff(value):
print(value) # We did something useful here
if __name__ == "__main__":
do_stuff(None) # Could also use config with defaults
And your top.py presumably being the entry point and orchestrating importing and execution doing:
import other
x = get_the_value()
other.do_stuff(x)
You can of course introduce an interface to configure do_stuff perhaps a dict or a custom class even with default implementation in config.py:
class Params:
def __init__(self, x=None):
self.x = x
and your other.py:
def do_stuff(params=config.Params()):
print(params.x) # We did something useful here
And on your top.py you can use:
params = config.Params(get_the_value())
other.do_stuff(params)
But you could also have any use case specific source of value(s):
class TopParams:
def __init__(self, url):
self.x = get_value_from_url(url)
params = TopParams("https://example.com/value-source")
other.do_stuff(params)
x could even be a property which you retrieve every time you access it... or lazily when needed and then cached... Again, it really then is a matter of what you need to do.
"Is it bad practice to modify attributes of one module from another module?"
that it is considered as bad practice - violation of the law of demeter, which means in fact "talk to friends, not to strangers".
Objects should expose behaviour and functions, but should HIDE the data.
DataStructures should EXPOSE data, but should not have any methods (which are exposed). The law of demeter does not apply to such DataStructures. OOP Purists might cover such DataStructures with setters and getters, but it really adds no value in Python.
there is a lot of literature about that like : https://en.wikipedia.org/wiki/Law_of_Demeter
and of course, a must to read: "Clean Code", by Robert C. Martin (Uncle Bob), check it out on Youtube also.
For procedural programming it is perfectly normal to keep data in a DataStructure which does not have any (exposed) methods.
The procedures in the program work with that data. Consider to use the module attrs, see : https://www.attrs.org/en/stable/ for easy creation of such classes.
my prefered method for keeping config is (here without using attrs):
# conf_xy.py
"""
config is code - so why use damned parsers, textfiles, xml, yaml, toml and all that
if You just can use testable code as config that can deliver the correct types, etc.
as well as hinting in Your favorite IDE ?
Here, for demonstration without using attrs package - usually I use attrs (read the docs)
"""
class ConfXY(object):
def __init__(self) -> None:
self.x: int = 1
self.z: float = get_z_from_input()
...
conf_xy=ConfXY()
# other.py
from conf_xy import conf_xy
...
y = conf_xy.x * 2
...

Is it possible to mock a library?

After dozens of research on the subject and a lot of thinking, I leave it to you in this new question:
Is it possible to mock an entire library with Python? I would like the import of this library and all its packages / modules / etc to be done without having to define each element by hand, with mock and sys.module ... :(
In my case, I use a library specific to the job and I would like to be able to work on my code at home, without having to recode my imports, on code which is not dependent on this library.
Example:
"""Main file.
I define the mock here.
"""
mocked = MagicLibraryMock("mylib") # the dream
"""File with lib imports.
I can import anything and use it as a mock.
"""
import mylib
from mylib.a import b
from mylib.z import c
from mylib.a.e.r import x
foo = x()
bar = c.a.e.r.t.d()
bar.side_effect = [1, 2, 3]
bar()
I tried to integrate a class inherited from a dictionary to overload the __getitem__ method of sys.modules. But the problem is that the import method also uses __iter__, and there it becomes much more complicated to return a MagicMock according to the result, knowing that it is not recommended to directly modify the import source code - source.
Finally I lose less time extracting imports from my application to sub-modules which will take care of solving them. I can thus intercept these imports more easily without dirtying my code.
The design is more interesting.
Thanks for your help.

Dynamic Python Class Definition in SQLAlchemy

I'm creating a backend application with SQLAlchemy using the declarative base. The ORM requires about 15 tables each of which maps to a class object in SQLAlchemy. Because these class objects are all defined identically I thought a factory pattern could produce the classes more concisely. However, these classes not only have to be defined, they have to be assigned to unique variable names so they can be imported and used through the project.
(Sorry if this question is a bit long, I updated it as I better understood the problem.)
Because we have so many columns (~1000) we define their names and types in external text files to keep things readable. Having done that one way to go about declaring our models is like this:
class Foo1(Base):
__tablename___ = 'foo1'
class Foo2(Base):
__tablename___ = 'foo2'
... etc
and then I can add the columns by looping over the contents of the external text file and using the setattr() on each class definition.
This is OK but it feels too repetitive as we have about 15 tables. So instead I took a stab at writing a factory function that could define the classes dynamically.
def orm_factory(class_name):
class NewClass(Base):
__tablename__ = class_name.lower()
NewClass.__name__ = class_name.upper()
return NewClass
Again I can just loop over the columns and use setattr(). When I put it together it looks like this:
for class_name in class_name_list:
ORMClass = orm_factory(class_name)
header_keyword_list = get_header_keyword_list(class_name)
define_columns(ORMClass, header_keyword_list)
Where get_header_keyword_list gets the column information and define_columns performs the setattr() assignment. When I use this and run Base.metadata.create_all() the SQL schema get generated just fine.
But, when I then try to import these class definitions into another model I get an error like this:
SAWarning: The classname 'NewClass' is already in the registry of this declarative base, mapped to <class 'ql_database_interface.IR_FLT_0'>
This, I now realize makes total sense based on what I learned yesterday: Python class variable name vs __name__.
You can address this by using type as a class generator in your factory function (as two of the answers below do). However, this does not solve the issue of being able to import the class because the while the classes are dynamically constructed in the factory function the variable the output of that function is assigned to is static. Even if it were dynamic, such as a dictionary key, it has to be in the module name space in order to be imported from another module. See my answer for more details.
This sounds like a sketchy idea. But it's fun to solve so here is how you make it work.
As I understand it, your problem is you want to add dynamically created classes to a module. I created a hack using a module and the init.py file.
dynamicModule/__init__.py:
import dynamic
class_names = ["One", "Two", "Three"]
for new_name in class_names:
dynamic.__dict__['Class%s' % new_name] = type("Class%s" % (new_name), (object,), {'attribute_one': 'blah'})
dynamicModule/dynamic.py:
"""Empty file"""
test.py:
import dynamicModule
from dynamicModule import dynamic
from dynamicModule.dynamic import ClassOne
dynamic.ClassOne
"""This all seems evil but it works for me on python 2.6.5"""
__init__.py:
"""Empty file"""
[Note, this is the original poster]
So after some thinking and talking to people I've decided that that ability to dynamically create and assign variables to class objects in the global name space in this way this just isn't something Python supports (and likely with good reason). Even though I think my use case isn't too crazy (pumping out predefined list of identically constructed classes) it's just not supported.
There are lots of questions that point towards using a dictionary in a case like this, such as this one: https://stackoverflow.com/a/10963883/1216837. I thought of something like that but the issue is that I need those classes in the module name space so I can import them into other modules. However, adding them with globals() like globals()['MyClass'] = class_dict['MyClass'] seems like it's getting pretty out there and my impression is people on SO frown on using globals() like this.
There are hacks such as the one suggested by patjenk but at a certain point the obfuscation and complexity out weight the benefits of the clarity of declaring each class object statically. So while it seems repetitive I'm just going to write out all the class definitions. Really, this end up being pretty concise/maintainable:
Class1 = class_factory('class1')
Class2 = class_factory('class2')
...

Is it possible to overload from/import in Python?

Is it possible to overload the from/import statement in Python?
For example, assuming jvm_object is an instance of class JVM, is it possible to write this code:
class JVM(object):
def import_func(self, cls):
return something...
jvm = JVM()
# would invoke JVM.import_func
from jvm import Foo
This post demonstrates how to use functionality introduced in PEP-302 to import modules over the web. I post it as an example of how to customize the import statement rather than as suggested usage ;)
It's hard to find something which isn't possible in a dynamic language like Python, but do we really need to abuse everything? Anyway, here it is:
from types import ModuleType
import sys
class JVM(ModuleType):
Foo = 3
sys.modules['JVM'] = JVM
from JVM import Foo
print Foo
But one pattern I've seen in several libraries/projects is some kind of a _make_module() function, which creates a ModuleType dynamically and initializes everything in it. After that, the current Module is replaced by the new module (using the assignment to sys.modules) and the _make_module() function gets deleted. The advantage of that, is that you can loop over the module and even add objects to the module inside that loop, which is quite useful sometimes (but use it with caution!).

Categories