How to avoid circular dependencies in validation module - python

I recently refactored my code to put input validation methods that are shared among several classes in their own module, validate.py. Some of these validation methods check if their input is an instance of a class, e.g. MyClass. Therefore validate.py must import MyClass so it's method is_MyClass can check if isinstance(input, MyClass). But, I want to use some validation methods from validate.py in MyClass to sanitize input to MyClass.my_method, so MyClass must import validate.py.
Something tells me I just casually refactored my way into an anti-pattern. If what I'm trying to do implies circular dependencies, then I must be Doing It Wrong™.
But, code reuse is a good idea. So what's the best practice for sharing validation methods in this way?

I think the parts of the validation code that are specific to one of the classes should probably be put into the class itself - maybe as a classmethod? That way the 'generic' validation code can just call obj.validate() at the appropriate time. You then don't need to import the classes from the generic validation code.

While Tom Dalton's answer is probably correct as far as the best design goes, it may be worth noting that import cycles often work just fine in Python.
The limitation though is that you need to use import my_module syntax and avoid top-level (global) code that uses the imported modules. Declaring functions (or classes with methods) that use the imported module is fine.
You usually run into trouble if you're using from my_module import obj or something similar, since this will only work if obj has already been defined in the other module. If that other module is in the process of importing your module, the class definition or global variable assignment may not have have happened yet.
So for your specific case, an alternative solution may be to have your validate module use import my_class, then is_MyClass can do isinstance(input, my_class.MyClass).

Related

What is the benefit of putting an import statement inside of a class? [duplicate]

I created a module named util that provides classes and functions I often use in Python.
Some of them need imported features. What are the pros and the cons of importing needed things inside class/function definition? Is it better than import at the beginning of a module file? Is it a good idea?
It's the most common style to put every import at the top of the file. PEP 8 recommends it, which is a good reason to do it to start with. But that's not a whim, it has advantages (although not critical enough to make everything else a crime). It allows finding all imports at a glance, as opposed to looking through the whole file. It also ensures everything is imported before any other code (which may depend on some imports) is executed. NameErrors are usually easy to resolve, but they can be annoying.
There's no (significant) namespace pollution to be avoided by keeping the module in a smaller scope, since all you add is the actual module (no, import * doesn't count and probably shouldn't be used anyway). Inside functions, you'd import again on every call (not really harmful since everything is imported once, but uncalled for).
PEP8, the Python style guide, states that:
Imports are always put at the top of
the file, just after any module
comments and docstrings, and before module globals and constants.
Of course this is no hard and fast rule, and imports can go anywhere you want them to. But putting them at the top is the best way to go about it. You can of course import within functions or a class.
But note you cannot do this:
def foo():
from os import *
Because:
SyntaxWarning: import * only allowed at module level
Like flying sheep's answer, I agree that the others are right, but I put imports in other places like in __init__() routines and function calls when I am DEVELOPING code. After my class or function has been tested and proven to work with the import inside of it, I normally give it its own module with the import following PEP8 guidelines. I do this because sometimes I forget to delete imports after refactoring code or removing old code with bad ideas. By keeping the imports inside the class or function under development, I am specifying its dependencies should I want to copy it elsewhere or promote it to its own module...
Only move imports into a local scope, such as inside a function definition, if it’s necessary to solve a problem such as avoiding a circular import or are trying to reduce the initialization time of a module. This technique is especially helpful if many of the imports are unnecessary depending on how the program executes. You may also want to move imports into a function if the modules are only ever used in that function. Note that loading a module the first time may be expensive because of the one time initialization of the module, but loading a module multiple times is virtually free, costing only a couple of dictionary lookups. Even if the module name has gone out of scope, the module is probably available in sys.modules.
https://docs.python.org/3/faq/programming.html#what-are-the-best-practices-for-using-import-in-a-module
I believe that it's best practice (according to some PEP's) that you keep import statements at the beginning of a module. You can add import statements to an __init__.py file, which will import those module to all modules inside the package.
So...it's certainly something you can do the way you're doing it, but it's discouraged and actually unnecessary.
While the other answers are mostly right, there is a reason why python allows this.
It is not smart to import redundant stuff which isn’t needed. So, if you want to e.g. parse XML into an element tree, but don’t want to use the slow builtin XML parser if lxml is available, you would need to check this the moment you need to invoke the parser.
And instead of memorizing the availability of lxml at the beginning, I would prefer to try importing and using lxml, except it’s not there, in which case I’d fallback to the builtin xml module.

How to share same variable between imported modules

There are two Python scripts: master.py and to_be_imported.py
Here is the master.py:
import os
os.foo = 12345
import to_be_imported
And here is the to_be_imported.py:
import os
if hasattr(os, 'foo'):
print 'os hasattr foo: %s'%os.foo
Now when I run master.py I get this:
os hasattr foo: 12345
indicating that the imported module to_be_imported.py picks up the variable declared inside the process that imported it (master.py).
While it works fine I would like to know why it works and also to make sure it is a safe practice.
If a module is already imported, subsequent imports to the module uses the cached version of the module. Even if you reference it via different names as in the following case
import os as a
import os as b
Both refer to the same os module that was imported the first time. So it is obvious that the variable assigned to a module will be shared.
You can verify it using the built-in python function id()
Nothing is a bad idea per se, but you must remember few things:
Modules are objects in Python. They are loaded only once and added to sys.modules. These objects can also be added attributes like regular objects (with no messy implementation of setattr).
Since they are objects, but not instantiable ones, you must consider them as singletons (they are singletons, after all), and you must consider the disadvantages and benefits of such model:
a. Singletons are only one object. Are you sure that accessing their attributes is concurrency-safe?
b. Modules are global objects. Are you sure you can track the whole behavior and access to their members? Are you sure you will be able to debug errors there?
Is the code something you will work with others?
While no idea is better than other, good practices tell us that using global variables is not well-seen, specially if we have a team to work with. On the other hand: if your code is concurrent and/or reentrant, avoid using global variables or relying on module attributes. OTOH you will have no problem assigning attributes like that. They will last for the life of your script execution.
This is not the place to chose the best alternative. Depending on how you state your problem, you can ask it either on programmers or codereview. You can chose many variants to share state without using global variables in modules, like passing those variables inside a state back and forth across arguments, or learning and using OOP. But, again, this site is no scope for that.

When to use a Singleton in python?

There are many questions related to the use of the Singleton pattern in python, and although this question might repeat many of the aspects already discussed, I have not found the answer to the following specific question.
Let's assume I have a class MyClass which I want to instantiate only exactly once. In python I can do this as follows in the code myclass.py:
class MyClass(object):
def foo(self):
....
instance = MyClass()
Then in any other program I can refer to the instance simply with
import myclass
myclass.instance.foo()
Under what circumstances is this approach enough? Under what circumstances is the use of a Singleton pattern useful/mandatory?
The singleton pattern is more often a matter of convenience than of requirement. Python is a little bit different than other languages in that it is fairly easy to mock out singletons in testing (just clobber the global variable!) by comparison to other languages, but it is neverthess a good idea to ask yourself when creating a singleton: am I doing this for the sake of convenience or because it is stricly necessary that there is only one instance? Is it possible that there may be more than one in the future?
If you create a class that really will be only constructed once, it may make more sense to make the state a part of the module, and to make its methods into module-level functions. If there is a possibility that the assumption of exactly one instance may change in the future, then it is often much better to pass the singleton instance around rather than referencing the singleton through a global name.
For example, you can just as easily implement a "singleton" this way:
if __name__ == '__main__':
instance = MyClass()
doSomethingWith(instance)
In the above, "instance" is singleton by virtue of the fact that it is constructed only once, but the code that handles it is provided the instance rather than referencing module.instance, which makes it easier to reuse pieces of the code if, in some future situation, you need more than one MyClass.
Assuming you want to use a module as a singleton as Michael Aaron Safyan suggests, you can make it work even if the module isn't imported by the main code by doing something like the following (in the main code or a module it does import direct or indirectly). What it does is make aninstanceclass attribute initialized it to one, and then replaces the module object insys.moduleswith the instance created:
class _MyClass(object):
def foo(self):
print 'foo()'
_MyClass.instance = _MyClass()
import sys
_ref = sys.modules[__name__] # Reference to current module so it's not deleted
sys.modules[__name__] = _MyClass.instance
I've found singletons a useful way to implement "registers" of things when it makes sense to have only one (registry) -- such as a group of classes for a class factory, a group of constants, or a bundle of configuration information. In many cases just a regular Python module will do fine because, by default, they're effectively already singletons due to fact that those already loaded get cached in the sys.modulesdictionary.
Occasionally however, class instances are preferable because they can be passed construction parameters and have properties -- something built-in module objects don't and can't be made to possess. Limitations like that can be worked-around using the trick shown above which effectively turns custom class instances into module objects.
The idea of using class instances as module objects is from Alex Martelli's ActiveState recipe named Constants in Python.
In my humble opinion, there are two sides to the singleton pattern.
you want a single context for a given service because more than one does not make sense.
you want to absolutely prevent people from creating two object of a given type because it might break your service
While the first case may have some applications (logging service), the second one is often the sign of a bad design.
You should design your API so that your users should not have to think about this problem. But if they dig through your undocumented layers to find your hidden constructor and want to use it for whatever reason, they should not have to deal with useless constructs created to prevent them to do what they need to do.

How to call methods of the instance of a library already in scope of the current running test case

I have a library that interfaces with an external tool and exposes some basic keywords to use from robotframework; This library is implemented as a python package, and I would like to implement extended functionality that implements complex logic, and exposes more keywords, within modules of this package. The package is given test case scope, but I'm not entirely sure how this works. If I suggest a few ways I have thought of, could someone with a bit more knowledge let me know where I'm on the right track, and where I'm barking up the wrong tree...
Use an instance variable - if the scope is such that the python interpreter will see the package as imported by the current test case (i.e this is treated as a separate package in different test cases rather than a separate instance of the same package), then on initialisation I could set a global variable INSTANCE to self and then from another module within the package, import INSTANCE and use it.
Use an instance dictionary - if the scope is such that all imports see the package as the same, I could use robot.running.context to set a dictionary key such that there is an item in the instance dictionary for each context where the package has been imported - this would then mean that I could use the same context variable as a lookup key in the modules that are based on this. (The disadvantage of this one is that it will prevent garbage collection until the package itself is out of scope, and relies on it being in scope persistently.)
A context variable that I am as of yet unaware of that will give me the instance that is in scope. The docs are fairly difficult to search, so it's fully possible that there is something that I'm missing that will make this trivial. Also just as good would be something that allowed me to call the keywords that are in scope.
Some excellent possibility I haven't considered....
So can anyone help?
Credit for this goes to Kevin O. from the robotframework user group, but essentially the magic lives in robot.libraries.BuiltIn.BuiltIn().get_library_instance(library_name) which can be used like this:
from robot.libraries.BuiltIn import BuiltIn
class SeleniumTestLibrary(object):
def element_should_be_really_visible(self):
s2l = BuiltIn().get_library_instance('Selenium2Library')
element = s2l._element_find(locator, True, False)
It sounds like you are talking about monkeypatching the imported code, so that other modules which import that package will also see your runtime modifications. (Correct me if I'm wrong; there are a couple of bits in your question that I'm not quite following)
For simple package imports, this should work:
import my_package
def method_override():
return "Foo"
my_package.some_method = method_override
my_package, in this case, refers to the imported module, and is not just a local name, so other modules will see the overridden method.
This won't work in cases where other code has already done
from my_package import some_method
Since in that case, some_method is a local name in the place it is imported. If you replace the method elsewhere, that change won't be seen.
If this is happening, then you either need to change the source to import the entire module, or patch a little bit deeper, by replacing method internals:
import my_package
def method_override():
return "Foo"
my_package.some_method.func_code = method_override.func_code
At that point, it doesn't matter how the method was imported in any other module; the code object associated with the method has been replaced, and your new code will run rather than the original.
The only thing to worry about in that case is that the module is imported from the same path in every case. The Python interpreter will try to reuse existing modules, rather than re-import and re-initialize them, whenever they are imported from the same path.
However, if your python path is set up to contain two directories, say: '/foo' and '/foo/bar', then these two imports
from foo.bar import baz
and
from bar import baz
would end up loading the module twice, and defining two versions of any objects (methods, classes, etc) in the module. If that happens, then patching one will not affect the other.
If you need to guard against that case, then you may have to traverse sys.modules, looking for the imported package, and patching each version that you find. This, of course, will only work if all of the other imports have already happened, you can't do that pre-emptively (without writing an import hook, but that's another level deeper again :) )
Are you sure you can't just fork the original package and extend it directly? That would be much easier :)

organising classes and modules in python

I'm getting a bit of a headache trying to figure out how to organise modules and classes together. Coming from C++, I'm used to classes encapsulating all the data and methods required to process that data. In python there are modules however and from code I have looked at, some people have a lot of loose functions stored in modules, whereas others almost always bind their functions to classes as methods.
For example say I have a data structure and would like to write it to disk.
One way would be to implement a save method for that object so that I could just type
MyObject.save(filename)
or something like that. Another method I have seen in equal proportion is to have something like
from myutils import readwrite
readwrite.save(MyObject,filename)
This is a small example, and I'm not sure how python specific this problem is at all, but my general question is what is the best pythonic practice in terms of functions vs methods organisation?
It seems like loose functions bother you. This is the python way. It makes sense because a module in python is really just an object on the same footing as any other object. It does have language level support for loading it from a file but other than that, it's just an object.
so if I have a module foo.py:
import pprint
def show(obj):
pprint(obj)
Then the when I import it from bar.py
import foo
class fubar(object):
#code
def method(self, obj):
#more stuff
foo.show(obj)
I am essentially accessing a method on the foo object. The data attributes of the foo module are just the globals that are defined in foo. A module is the language level implementation of a singleton without the need to prepend self to every methods argument list.
I try to write as many module level functions as possible. If some function will only work with an instance of a particular class, I will make it a method on the class. Otherwise, I try to make it work on instances of every class that is defined in the module for which it would make sense.
The rational behind the exact example that you gave is that if each class has a save method, then if you later change how you are saving data (from say filesystem to database or remote XML file) then you have to change every class. If each class implements an interface to yield that data that it wants saved, then you can write one function to save instances of every class and only change that function once. This is known as the Single Responsibility Principle: Each class should have only one reason to change.
If you have a regular old class you want to save to disk, I would just make it an instance method. If it were a serialization library that could handle different types of objects I would do the second way.

Categories