I want to build a well-modularized python project, where all alternative modules should be registed and acessed via a function named xxx_builder.
Taking data class as an example:
register.py:
def register(key, module, module_dict):
"""Register and maintain the data classes
"""
if key in module_dict:
logger.warning(
'Key {} is already pre-defined, overwritten.'.format(key))
module_dict[key] = module
data_dict = {}
def register_data(key, module):
register(key, module, data_dict)
data.py:
from register import register_data
import ABCDEF
class MyData:
"""An alternative data class
"""
pass
def call_my_data(data_type):
if data_type == 'mydata'
return MyData
register_data('mydata', call_my_data)
builder.py:
import register
def get_data(type):
"""Obtain the corresponding data class
"""
for func in register.data_dict.values():
data = func(type)
if data is not None:
return data
main.py:
from data import MyData
from builder import get_data
if __name__ == '__main__':
data_type = 'mydata'
data = get_data(type=data_type)
My problem
In main.py, to register MyData class into register.data_dict before calling the function get_data, I need to import data.py in advance to execute register_data('mydata', call_my_data).
It's okay when the project is small, and all the data-related classes are placed according to some rules (e.g. all data-related class should be placed under the directory data) so that I can import them in advance.
However, this registeration mechanism means that all data-related classes will be imported, and I need to install all packages even if I won't use it actually. For example, when the indicator data_type in main.py is not mydata I still need to install ABCDEF package for the class MyData.
So is there any good idea to avoid importing all the packages?
Python's packaging tools come with a solution for this: entry points. There's even a tutorial about how to use entry points for plugins (which seems like what you're doing) (in conjunction with this Setuptools tutorial).
IOW, something like this (nb. untested), if you have a plugin package that has defined
[options.entry_points]
myapp.data_class =
someplugindata = my_plugin.data_classes:SomePluginData
in setup.cfg (or pypackage.toml or setup.py, with their respective syntaxes), you could register all of these plugin classes (here shown with an example with a locally registered class too).
from importlib.metadata import entry_points
data_class_registry = {}
def register(key):
def decorator(func):
data_class_registry[key] = func
return func
return decorator
#register("mydata")
class MyData:
...
def register_from_entrypoints():
for entrypoint in entry_points(group="myapp.data_class"):
register(entrypoint.name)(entrypoint.load())
def get_constructor(type):
return data_class_registry[type]
def main():
register_from_entrypoints()
get_constructor("mydata")(...)
get_constructor("someplugindata")(...)
Related
My Python package depends on an external library for a few of it's functions. This is a non-Python package and can be difficult to install, so I'd like users to still be able to use my package but have it fail when using any functions that depend on this non-Python package.
What is the standard practice for this? I could only import the non-Python package inside the methods that use it, but I really hate doing this
My current setup:
myInterface.py
myPackage/
--classA.py
--classB.py
The interfaces script myInterface.py imports classA and classB and classB imports the non-Python package. If the import fails I print a warning. If myMethod is called and the package isn't installed there will be some error downstream but I do not catch it anywhere, nor do I warn the user.
classB is imported every time the interface script is called so I can't have anything fail there, which is why I included the pass. Like I said above, I could import inside the method and have it fail there, but I really like keeping all of my imports in one place.
From classB.py
try:
import someWeirdPackage
except ImportError:
print("Cannot import someWeirdPackage")
pass
class ClassB():
...
def myMethod():
swp = someWeirdPackage()
...
If you are only importing one external library, I would go for something along these lines:
try:
import weirdModule
available = True
except ImportError:
available = False
def func_requiring_weirdmodule():
if not available:
raise ImportError('weirdModule not available')
...
The conditional and error checking is only needed if you want to give more descriptive errors. If not you can omit it and let python throw the corresponding error when trying to calling a non-imported module, as you do in your current setup.
If multiple functions do use weirdModule, you can wrap the checking into a function:
def require_weird_module():
if not available:
raise ImportError('weirdModule not available')
def f1():
require_weird_module()
...
def f2():
require_weird_module()
...
On the other hand, if you have multiple libraries to be imported by different functions, you can load them dynamically. Although it doesn't look pretty, python caches them and there is nothing wrong with it. I would use importlib
import importlib
def func_requiring_weirdmodule():
weirdModule = importlib.import_module('weirdModule')
Again, if multiple of your functions import complicated external modules you can wrap them into:
def import_external(name):
return importlib.import_module(name)
def f1():
weird1 = import_external('weirdModule1')
def f2():
weird2 = import_external('weirdModule2')
And last, you could create a handler to prevent importing the same module twice, something along the lines of:
class Importer(object):
__loaded__ = {}
#staticmethod
def import_external(name):
if name in Importer.__loaded__:
return Importer.__loaded__[name]
mod = importlib.import_module(name)
Importer.__loaded__[name] = mod
return mod
def f1():
weird = Importer.import_external('weird1')
def f2():
weird = Importer.import_external('weird1')
Although I'm pretty sure that importlib does caching behing the scenes and you don't really need for manual caching.
In short, although it does look ugly, there is nothing wrong with importing modules dynamically in python. In fact, a lot of libraries rely on this. On the other hand, if it is just for an special case of 3 methods accessing 1 external function, do use your approach or my first one in case you cant to add custom sception handling.
I'm not really sure that there's any best practice in this situation, but I would redefine the function if it's not supported:
def warn_import():
print("Cannot import someWeirdPackage")
try:
import someWeirdPackage
external_func = someWeirdPackage
except ImportError:
external_func = warn_import
class ClassB():
def myMethod(self):
swp = external_func()
b = ClassB()
b.myMethod()
You can create two separate classes for the two cases. The first will be used when the the package exist . The second will used when the package does not exist.
class ClassB1():
def myMethod(self):
print("someWeirdPackage exist")
# do something
class ClassB2(ClassB1):
def myMethod(self):
print("someWeirdPackage does not exist")
# do something or raise Exception
try:
import someWeirdPackage
class ClassB(ClassB1):
pass
except ImportError:
class ClassB(ClassB2):
pass
You can also use given below approach to overcome the problem that you're facing.
class UnAvailableName(object):
def __init__(self, name):
self.target = name
def __getattr_(self, attr):
raise ImportError("{} is not available.".format(attr))
try:
import someWeirdPackage
except ImportError:
print("Cannot import someWeirdPackage")
someWeirdPackage = someWeirdPackage("someWeirdPackage")
class ClassB():
def myMethod():
swp = someWeirdPackage.hello()
a = ClassB()
a.myMethod()
I'm writing unit tests to validate my project functionalities. I need to replace some of the functions with mock function and I thought to use the Python mock library. The implementation I used doesn't seem to work properly though and I don't understand where I'm doing wrong. Here a simplified scenario:
root/connector.py
from ftp_utils.py import *
def main():
config = yaml.safe_load("vendor_sftp.yaml")
downloaded_files = []
downloaded_files = get_files(config)
for f in downloaded_files:
#do something
root/utils/ftp_utils.py
import os
import sys
import pysftp
def get_files(config):
sftp = pysftp.Connection(config['host'], username=config['username'])
sftp.chdir(config['remote_dir'])
down_files = sftp.listdir()
if down_files is not None:
for f in down_files:
sftp.get(f, os.path.join(config['local_dir'], f), preserve_mtime=True)
return down_files
root/tests/connector_tester.py
import unittest
import mock
import ftp_utils
import connector
def get_mock_files():
return ['digital_spend.csv', 'tv_spend.csv']
class ConnectorTester(unittest.TestCase)
#mock.patch('ftp_utils.get_files', side_effect=get_mock_files)
def test_main_process(self, get_mock_files_function):
# I want to use a mock version of the get_files function
connector.main()
When I debug my test I expect that the get_files function called inside the main of connector.py is the get_mock_files(), but instead is the ftp_utils.get_files(). What am I doing wrong here? What should I change in my code to properly call the get_mock_file() mock?
Thanks,
Alessio
I think there are several problems with your scenario:
connector.py cannot import from ftp_utils.py that way
nor can connector_tester.py
as a habit, it is better to have your testing files under the form test_xxx.py
to use unittest with patching, see this example
In general, try to provide working minimal examples so that it is easier for everyone to run your code.
I modified rather heavily your example to make it work, but basically, the problem is that you patch 'ftp_utils.get_files' while it is not the reference that is actually called inside connector.main() but probably rather 'connector.get_files'.
Here is the modified example's directory:
test_connector.py
ftp_utils.py
connector.py
test_connector.py:
import unittest
import sys
import mock
import connector
def get_mock_files(*args, **kwargs):
return ['digital_spend.csv', 'tv_spend.csv']
class ConnectorTester(unittest.TestCase):
def setUp(self):
self.patcher = mock.patch('connector.get_files', side_effect=get_mock_files)
self.patcher.start()
def test_main_process(self):
# I want to use a mock version of the get_files function
connector.main()
suite = unittest.TestLoader().loadTestsFromTestCase(ConnectorTester)
if __name__ == "__main__":
unittest.main()
NB: what is called when running connector.main() is 'connector.get_files'
connector.py:
from ftp_utils import *
def main():
config = None
downloaded_files = []
downloaded_files = get_files(config)
for f in downloaded_files:
print(f)
connector/ftp_utils.py unchanged.
Trying to add a few imports to my IPython profile so that when I open a kernel in the Spyder IDE they're always loaded. Spyder has a Qt interface (I think??), so I (a) checked to make sure I was in the right directory for the profile using the ipython locate command in the terminal (OSX), and (b) placing the following code in my ipython_qtconsole_config.py file:
c.IPythonQtConsoleApp.exec_lines = ["import pandas as pd",
"pd.set_option('io.hdf.default_format', 'table')",
"pd.set_option('mode.chained_assignment','raise')",
"from __future__ import division, print_function"]
But when I open a new window and type pd.__version__ I get the NameError: name 'pd' is not defined error.
Edit: I don't have any problems if I run ipython qtconsole from the Terminal.
Suggestions?
Thanks!
Whether Spyder uses a QT interface or not shouldn't be related to which of the IPython config files you want to modify. The one you chose to modify, ipython_qtconsole_config.py is the configuration file that is loaded when you launch IPython's QT console, such as with the command line command
user#system:~$ ipython qtconsole
(I needed to update pyzmq for this to work.)
If Spyder maintains a running IPython kernel and merely manages how to display that for you, then Spyder is probably just maintaining a regular IPython session, in which case you want your configuration settings to go into the file ipython_config.py at the same directory where you found ipython_qtconsole_config.py.
I manage this slightly differently than you do. Inside of ipython_config.py the top few lines for me look like this:
# Configuration file for ipython.
from os.path import join as pjoin
from IPython.utils.path import get_ipython_dir
c = get_config()
c.InteractiveShellApp.exec_files = [
pjoin(get_ipython_dir(), "profile_default", "launch.py")
]
What this does is to obtain the IPython configuration directory for me, add on the profile_default subdirectory, and then add on the name launch.py which is a file that I created just to hold anything I want to be executed/loaded upon startup.
For example, here's the first bit from my file launch.py:
"""
IPython launch script
Author: Ely M. Spears
"""
import re
import os
import abc
import sys
import mock
import time
import types
import pandas
import inspect
import cPickle
import unittest
import operator
import warnings
import datetime
import dateutil
import calendar
import copy_reg
import itertools
import contextlib
import collections
import numpy as np
import scipy as sp
import scipy.stats as st
import scipy.weave as weave
import multiprocessing as mp
from IPython.core.magic import (
Magics,
register_line_magic,
register_cell_magic,
register_line_cell_magic
)
from dateutil.relativedelta import relativedelta as drr
###########################
# Pickle/Unpickle methods #
###########################
# See explanation at:
# < http://bytes.com/topic/python/answers/
# 552476-why-cant-you-pickle-instancemethods >
def _pickle_method(method):
func_name = method.im_func.__name__
obj = method.im_self
cls = method.im_class
return _unpickle_method, (func_name, obj, cls)
def _unpickle_method(func_name, obj, cls):
for cls in cls.mro():
try:
func = cls.__dict__[func_name]
except KeyError:
pass
else:
break
return func.__get__(obj, cls)
copy_reg.pickle(types.MethodType, _pickle_method, _unpickle_method)
#############
# Utilities #
#############
def interface_methods(*methods):
"""
Class decorator that can decorate an abstract base class with method names
that must be checked in order for isinstance or issubclass to return True.
"""
def decorator(Base):
def __subclasshook__(Class, Subclass):
if Class is Base:
all_ancestor_attrs = [ancestor_class.__dict__.keys()
for ancestor_class in Subclass.__mro__]
if all(method in all_ancestor_attrs for method in methods):
return True
return NotImplemented
Base.__subclasshook__ = classmethod(__subclasshook__)
return Base
def interface(*attributes):
"""
Class decorator checking for any kind of attributes, not just methods.
Usage:
#interface(('foo', 'bar', 'baz))
class Blah
pass
Now, new classes will be treated as if they are subclasses of Blah, and
instances will be treated instances of Blah, provided they possess the
attributes 'foo', 'bar', and 'baz'.
"""
def decorator(Base):
def checker(Other):
return all(hasattr(Other, a) for a in attributes)
def __subclasshook__(cls, Other):
if checker(Other):
return True
return NotImplemented
def __instancecheck__(cls, Other):
return checker(Other)
Base.__metaclass__.__subclasshook__ = classmethod(__subclasshook__)
Base.__metaclass__.__instancecheck__ = classmethod(__instancecheck__)
return Base
return decorator
There's a lot more, probably dozens of helper functions, snippets of code I've thought are cool and just want to play with, etc. I also define some randomly generated toy data sets, like NumPy arrays and Pandas DataFrames, so that when I want to poke around with some one-off Pandas syntax or something, some toy data is always right there.
The other upside is that this factors out the custom imports, function definitions, etc. that I want loaded, so if I want the same things loaded for the notebook and/or the qt console, I can just add the same bit of code to exec the file launch.py and I can make changes in only launch.py without having to manually migrate them to each of the three configuration files.
I also uncomment a few of the different settings, especially for plain IPython and for the notebook, so the config files are meaningfully different from each other, just not based on what modules I want imported on start up.
I'm currently writing some kind of tiny api to support extending module classes. Users should be able to just write their class name in a config and it gets used in our program. The contract is, that the class' module has a function called create(**kwargs) to return an instance of our base module class, and is placed in a special folder. But the isinstance check Fails as soon as the import is made dynamically.
modules are placed in lib/services/name
module base class (in lib/services/service)
class Service:
def __init__(self, **kwargs):
#some initialization
example module class (in lib/services/ping)
class PingService(Service):
def __init__(self, **kwargs):
Service.__init__(self,**kwargs)
# uninteresting init
def create(kwargs):
return PingService(**kwargs)
importing function
import sys
from lib.services.service import Service
def doimport( clazz, modPart, kw, class_check):
path = "lib/" + modPart
sys.path.append(path)
mod = __import__(clazz)
item = mod.create(kw)
if class_check(item):
print "im happy"
return item
calling code
class_check = lambda service: isinstance(service, Service)
s = doimport("ping", "services", {},class_check)
print s
from lib.services.ping import create
pingService = create({})
if isinstance(pingService, Service):
print "why this?"
what the hell am I doing wrong
here is a small example zipped up, just extract and run test.py without arguments
zip example
The problem was in your ping.py file. I don't know exactly why, but when dinamically importing it was not accepting the line from service import Service, so you just have to change it to the relative path: from lib.services.service import Service. Adding lib/services to the sys.path could not make it work the inheritance, which I found strange...
Also, I am using imp.load_source which seems more robust:
import os, imp
def doimport( clazz, modPart, kw, class_check):
path = os.path.join('lib', modPart, clazz + '.py')
mod = imp.load_source( clazz, path )
item = mod.create(kw)
if class_check(item):
print "im happy"
return item
Either it's lack of sleep but I feel silly that I can't get this. I have a plugin, I see it get loaded but I can't instantiate it in my main file:
from transformers.FOMIBaseClass import find_plugins, register
find_plugins()
Here's my FOMIBaseClass:
from PluginBase import MountPoint
import sys
import os
class FOMIBaseClass(object):
__metaclass__ = MountPoint
def __init__(self):
pass
def init_plugins(self):
pass
def find_plugins():
plugin_dir = os.path.dirname(os.path.realpath(__file__))
plugin_files = [x[:-3] for x in os.listdir(plugin_dir) if x.endswith("Transformer.py")]
sys.path.insert(0, plugin_dir)
for plugin in plugin_files:
mod = __import__(plugin)
Here's my MountPoint:
class MountPoint(type):
def __init__(cls,name,bases,attrs):
if not hasattr(cls,'plugins'):
cls.plugins = []
else:
cls.plugins.append(cls)
I see it being loaded:
# /Users/carlos/Desktop/ws_working_folder/python/transformers/SctyDistTransformer.pyc matches /Users/carlos/Desktop/ws_working_folder/python/transformers/SctyDistTransformer.py
import SctyDistTransformer # precompiled from /Users/carlos/Desktop/ws_working_folder/python/transformers/SctyDistTransformer.pyc
But, for the life of me, I can't instantiate the 'SctyDistTransformer' module from the main file. I know I'm missing something trivial. Basically, I want to employ a class loading plugin.
To dymically load Python modules from arbitrary folders use imp module:
http://docs.python.org/library/imp.html
Specifically the code should look like:
mod = imp.load_source("MyModule", "MyModule.py")
clz = getattr(mod, "MyClassName")
Also if you are building serious plug-in architecture I recommend using Python eggs and entry points:
http://wiki.pylonshq.com/display/pylonscookbook/Using+Entry+Points+to+Write+Plugins
https://github.com/miohtama/vvv/blob/master/vvv/main.py#L104