Several modules in a package importing one common module - python

I am writing a python package. I am using the concept of plugins - where each plugin is a specialization of a Worker class. Each plugin is written as a module (script?) and spawned in a separate process.
Because of the base commonality between the plugins (e.g. all extend a base class 'Worker'), The plugin module generally looks like this:
import commonfuncs
def do_work(data):
# do customised work for the plugin
print 'child1 does work with %s' % data
In C/C++, we have include guards, which prevent a header from being included more than once.
Do I need something like that in Python, and if yes, how may I make sure that commonfuncs is not 'included' more than once?

No worry: only the first import of a module in the course of a program's execution causes it to be loaded. Every further import after that first one just fetches the module object from a "cache" dictionary (sys.modules, indexed by module name strings) and therefore it's both very fast and bereft of side effects. Therefore, no guard is necessary.

Related

Python - Importing function from external file and global modules

In order to simplify my code, I have put various functions into external files with I load via:
from (external_file) import (function_name)
...which works fine.
My question though has to do with other modules, such as cv2 or numpy - do I need those listed in my external file (as well as my main file) or is there a way to just list them in my main file?
Each file you put Python code in is its own module. Each module has its own namespace. If some of your code (in any module) uses some library code, it will need some way to access the library from the namespace it is defined in.
Usually this means you need to import the library in each module it's being used from. Don't worry about duplication, modules are cached when they are first loaded, so additional imports from other modules will quickly find the existing module and just add a reference to it in their own namespaces.
Note that it's generally not a good idea to split up your code too much. There's certainly no need for every function or every class to have its own file. Instead, use modules to group related things together. If you have a couple of functions that interoperate a lot, put them in the same module.

Python: how to ensure that import comes from defining module?

I think the following is bad style:
# In foo_pkg/bla_mod.py:
magic_value=42
# In foo_pkg/bar_mod.py:
from .bla_mod import magic_value
# In doit.py:
from foo_pkg.bar_mod import magic_value
Instead, I'd like that to always import an object from the module where it has been defined, i.e. in this case:
# In doit.py:
from foo_pkg.bla_mod import magic_value
Finding issues of this sort by hand is getting tedious very quickly (for each imported object, you have to open the module and check if it defines the object, or if it imports in from another module).
What is the best way to automate this check? To my surprise, neither pylint nor pyflakes seem to have an appropriate checker, but maybe there's another tool (or even some trick that can be used in Python itself)?
Problem statement in a nutshell: given a bunch of python source files, find every import of an object from a module that does not itself define the object.
I know there are libraries (including the standard library) where one module provides the main external API and imports the necessary symbols from other modules internally. However, that's not the case in the code base I'm working with, here these are artifacts of refactorings that I really want to eliminate.
Here's a draft script that solves the problem less than 100 lines: http://pastebin.com/CFsR6b3s

Mocking Python iterables for use with Sphinx

I'm using Sphinx to document a project that depends on wxPython, using the autodocs extension so that it will automatically generate pages from our docstrings. The autodocs extension automatically operates on every module you import, which is fine for our packages but is a problem when we import a large external library like wxPython. Thus, instead of letting it generate everything from wxPython I'm using the unittest.mock library module (previously the external package Mock). The most basic setup works fine for most parts of wxPython, but I've run into a situation I can't see an easy way around (likely because of my relative unfamiliarity with mock until this week).
Currently, the end of my conf.py file has the following:
MOCK_MODULES = ['wx.lib.newevent'] # I've skipped irrelevant entries...
for module_name in MOCK_MODULES:
sys.modules[module_name] = mock.Mock()
For all the wxPython modules but wx.lib.newevent, this works perfectly. However, here I'm using the newevent.NewCommandEvent() function[1] to create an event for a particular scenario. In this case, I get a warning on the NewCommandEvent() call with the note TypeError: 'Mock' object is not iterable.
While I can see how one would use patching to handle this for building out unit tests (which I will be doing in the next month!), I'm having a hard time seeing how to integrate that at a simple level in my Sphinx configuration.
Edit: I've just tried using MagicMock() as well; this still produces an error at the same point, though it now produces ValueError: need more than 0 values to unpack. That seems like a step in the right direction, but I'm still not sure how to handle this short of explicitly setting it up for this one module. Maybe that's the best solution, though?
Footnotes
Yes, that's a function, naming convention making it look like a class notwithstanding; wxPython follows the C++ naming conventions which are used throughout the wxWidgets toolkit.
From the error, it looks like it is actually executing newevent.NewCommandEvent(), so I assume that somewhere in your code you have a top-level line something like this:
import wx.lib.newevent
...
event, binder = wx.lib.newevent.NewCommandEvent()
When autodoc imports the module, it tries to run this line of code, but since NewCommandEvent is actually a Mock object, Python can't bind its output to the (event, binder) tuple. There are two possible solutions. The first is to change your code to that this is not executed on import, maybe by wrapping it inside if __name__ == '__main__'. I would recommend this solution because creating objects like this on import can often have preblematic side effects.
The second solution is to tell the Mock object to return appropriate values thus:
wx.lib.newevent.NewCommandEvent = mock.Mock(return_value=(Mock(), Mock()))
However, if you are doing anything in your code with the returned values you might run into the same problem further down the line.

Populating Factory using Metaclasses in Python

Obviously, registering classes in Python is a major use-case for metaclasses. In this case, I've got a serialization module that currently uses dynamic imports to create classes and I'd prefer to replace that with a factory pattern.
So basically, it does this:
data = #(Generic class based on serial data)
moduleName = data.getModule()
className = data.getClass()
aModule = __import__(moduleName)
aClass = getattr(aModule, className)
But I want it to do this:
data = #(Generic class based on serial data)
classKey = data.getFactoryKey()
aClass = factory.getClass(classKey)
However, there's a hitch: If I make the factory rely on metaclasses, the Factory only learns about the existence of classes after their modules are imported (e.g., they're registered at module import time). So to populate the factory, I'd have to either:
manually import all related modules (which would really defeat the purpose of having metaclasses automatically register things...) or
automatically import everything in the whole project (which strikes me as incredibly clunky and ham-fisted).
Out of these options, just registering the classes directly into a factory seems like the best option. Has anyone found a better solution that I'm just not seeing? One option might be to automatically generate the imports required in the factory module by traversing the project files, but unless you do that with a commit-hook, you run the risk of your factory getting out of date.
Update:
I have posted a self-answer, to close this off. If anyone knows a good way to traverse all Python modules across nested subpackages in a way that will never hit a cycle, I will gladly accept that answer rather than this one. The main problem I see happening is:
\A.py (import Sub.S2)
\Sub\S1.py (import A)
\Sub\S2.py
\Sub\S3.py (import Sub.S2)
When you try to import S3, it first needs to import Main (otherwise it won't know what a Sub is). At that point, it tries to import A. While there, the __init__.py is called, and tries to register A. At this point, A tries to import S1. Since the __init__.py in Sub is hit, it tries to import S1, S2, and S3. However, S1 wants to import A (which does not yet exist, as it is in the process of being imported)! So that import fails. You can switch how the traversal occurs (i.e., depth first rather than breadth first), but you hit the same issues. Any insight on a good traversal approach for this would be very helpful. A two-stage approach can probably solve it (i.e., traverse to get all module references, then import as a flat batch). However, I am not quite sure of the best way to handle the final stage (i.e., to know when you are done traversing and then import everything). My big restriction is that I do not want to have a super-package to deal with (i.e., an extra directory under Sub and A). If I had that, it could kick off traversal, but everything would need to import relative to that for no good reason (i.e., all imports longer by an extra directory). Thusfar, adding a special function call to sitecustomize.py seems like my only option (I set the root directory for the package development in that file anyway).
The solution I found to this was to do all imports on the package based off of a particular base directory and have special __init__.py functions for all of the ones that might have modules with classes that I'd want to have registered. So basically, if you import any module, it first has to import the base directory and proceeds to walk every package (i.e., folder) with a similar __init__.py file.
The downside of this approach is that the same modules are sometimes imported multiple times, which is annoying if anyone leaves code with side effects in a module import. However, that's bad either way. Unfortunately, some major packages (cough, cough: Flask) have serious complaints with IDLE if you do this (IDLE just restarts, rather than doing anything). The other downside is that because modules import each other, sometimes it attempts to import a module that it is already in the process of importing (an easily caught error, but one I'm still trying to stamp out). It's not ideal, but it does get the job done. Additional details on the more specific issue are attached, and if anyone can offer a better answer, I will gladly accept it.

Is there anyway to clear python bytecode cache?

each unit test I'm running is writing python code out to a file, then importing it as a module. The problem is that the code changes but further import statements don't modify the module.
I think what I need is a way to ether force a reload on a module or clear the internal bytecode cache. Any ideas?
Thanks!
Reimporting modules is tricky to get all the edge cases right. The documentation for reload mentions some of them. Depending on what you are testing, you may be better off by testing the imports with separate invocations of the interpreter by running each via, say, subprocess. It will likely be slower but also likely safer and more accurate testing.
Use reload().
Reload a previously imported module.
The argument must be a module object,
so it must have been successfully
imported before. This is useful if you
have edited the module source file
using an external editor and want to
try out the new version without
leaving the Python interpreter. The
return value is the module object (the
same as the module argument).
However, the module needs to be already loaded. A workaround is to handle the resulting NameError:
try:
reload(math)
except NameError:
import math
Write your code to differently-named modules. Writing new code into an existing file, and trying to import it again will not work well.
Alternatively, you can clobber sys.modules. For example:
class MyTestCase(unittest.TestCase):
def setUp(self):
# Record sys.modules here so we can restore it in tearDown.
self.old_modules = dict(sys.modules)
def tearDown(self):
# Remove any new modules imported during the test run. This lets us
# import the same source files for more than one test.
for m in [m for m in sys.modules if m not in self.old_modules]:
del sys.modules[m]
Ran into a similar situation.
Later on found that the white space indentation technique used matters.
Especially on windows platforms, ensure that a uniform technique is adapted
throughout the module i.e., either use tab or spaces exclusively.

Categories