Apache Airflow -- custom class to be used in multiple places - python

I'm using Airflow 1.10.2. And I'm trying to define a custom module which would contain general functionality which can be used in multiple dags as well as operators.
A specific example can be an enum. I want to use it within a custom operator (to modify its behaviour). But I also want to use it within a dag definition where it can be used as a parameter.
This is my current hierarchy
airflow_home
| - dags/
- __init__.py
- my_dag.py
| - plugins/
- operators/
- __init__.py
- my_operator.py
- common/
- __init__.py
- my_enum.py
Let's say I want define an enum (in my_enum.py module):
class MyEnum(Enum):
OPTION_1 = 1
OPTION_2 = 2
It is imported to the operator (in my_operator.py) as:
from common.my_enum import MyEnum
And in to the dag (in my_dag.py) the same way:
from common.my_enum import MyEnum
Strangely(?), this works for me. However, I'm very uncertain whether this is the correct way of doing such thing. I was told by a colleague that he tried to do this in the past (possibly on older version of Airflow) and it was not working ("broken dag" when airflow started). Therefore, I'm afraid it might not be (might stop) working in the future or in specific conditions, as it is neither an operator, nor a sensor etc.
I didn't find any guidelines on how to separate shared behaviour. I find the airflow import system quite complicated and not very straight forward. My ideal solution would be to move the module common on the same level as dags and operators.
Also I'm not very sure about how to interpret this sentence from the docs: The python modules in the plugins folder get imported, and hooks, operators, sensors, macros, executors and web views get integrated to Airflow’s main collections and become available for use. Does it mean that my approach is correct, because any python module in plugins/ gets imported?
Is this a good way to achieve my goal, or is there a better solution?
Thank you for your advice

It's a bit of a hakish way of doing it.
Proper way would be first to create a hook and than operator which will use this hook. For simpler case as one below you don't even need to call hook in operator.
#1. Placing
<PROJECT NAME>/<PLUGINS_FOLDER>/<PLUGIN NAME>/__init__.py
<PROJECT NAME>/<PLUGINS_FOLDER>/<PLUGIN NAME>/<some_new>_hook.py
<PROJECT NAME>/<PLUGINS_FOLDER>/<PLUGIN NAME>/<some_new>_operator.py
For real case scenario that would look like this:
CRMProject/crm_plugin/__init__.py
CRMProject/crm_plugin/crm_hook.py
CRMProject/crm_plugin/customer_operator.py
#2. Code
Sample code of CRMProject/crm_plugin/__init__.py:
# CRMProject/crm_plugin/__init__.py
from airflow.plugins_manager import AirflowPlugin
from crm_plugin.crm_hook import CrmHook
from crm_plugin.customer_operator import CreateCustomerOperator, DeleteCustomerOperator, UpdateCustomerOperator
class AirflowCrmPlugin(AirflowPlugin):
name = "crm_plugin" # does not need to match the package name
operators = [CreateCustomerOperator, DeleteCustomerOperator, UpdateCustomerOperator]
sensors = []
hooks = [CrmHook]
executors = []
macros = []
admin_views = []
flask_blueprints = []
menu_links = []
appbuilder_views = []
appbuilder_menu_items = []
global_operator_extra_links = []
operator_extra_links = []
Sample code for hook class - CRMProject/crm_plugin/crm_hook.py. Don't ever call it directly from system\API. Use Operator for that (see below).
from airflow.hooks.base_hook import BaseHook
from airflow.exceptions import AirflowException
from crm_sdk import crm_api # import external libraries to interact with target system
class CrmHook(BaseHook):
"""
Hook to interact with the ACME CRM System.
"""
def __init__(self, ...):
# your code goes here
def insert_object(self, ...):
"""
Insert an object into the CRM system
"""
# your code goes here
def update_object(self, ...):
"""
Update an object into the CRM system
"""
# your code goes here
def delete_object(self, ...):
"""
Delete an object into the CRM system
"""
# your code goes here
def extract_object(self, ...):
"""
Extract an object into the CRM system
"""
# your code goes here
Sample code for Operator (CRMProject/crm_plugin/customer_operator.py) which you will use in DAGs. Operators require that you implement an execute method. This is the entry point into your operator for Airflow and it is called when the task in your DAG executes.
The apply_defaults decorator wraps the __init__ method of the class which applies the DAG defaults, set in your DAG script, to the task instance of your operator at run time.
There are also two important class attributes that we can set. These are templated_fields and template_ext. These two attributes are iterables that should contain the string values for the fields and/or file extensions that will allow templating with the jinja templating support in Airflow.
from airflow.exceptions import AirflowException
from airflow.operators import BaseOperator
from airflow.utils.decorators import apply_defauls
from crm_plugin.crm_hook import CrmHook
class CreateCustomerOperator(BaseOperator):
"""
This operator creates a new customer in the ACME CRM System.
"""
template_fields = ['first_contact_date', 'bulk_file']
template_ext = ['.csv']
#apply_defaults
def __init__(self, first_contact_date, bulk_file, ...):
# your code goes here
def _customer_exist(self, ...):
"""
Helper method to check if a customer exist. Raises an exception if it does.
"""
# your code goes here
def execute(self, context):
"""
Create a new customer in the CRM system.
"""
# your code goes here
You can create as many methods in your class as you need in order to simplify you execute method. The same principles for good class design still counts here.
#3. Deploying and using your plugin
Once you have completed work on your plugin all that is left for you to do is to copy your <PLUGIN NAME> package folder to the Airflow plugins folder. Airflow will pick the plugin up and it will become available to your DAGs.
If we copied the simple CRM plugin to our plugins_folder the folder structure would look like this.
<plugins_folder>/crm_plugin/__init__.py
<plugins_folder>/crm_plugin/crm_hook.py
<plugins_folder>/crm_plugin/customer_operator.py
In order to use your new plugin you will simply import your Operators and Hooks using the following statements.
from airflow.hooks.crm_plugin import CrmHook
from airflow.operators.crm_plugin import CreateCustomerOperator, DeleteCustomerOperator, UpdateCustomerOperator
Source

Related

How can the default list of email addresses be rendered from a variable in Airflow?

I have a lot of Airflow DAGs and I'd like to automatically send email notifications to the same list of recipients (stored in an Airflow variable) on any task failure, so I'm using the following default operator configuration defined at the DAG level:
dag = DAG(
...
default_args = {
...
"email": "{{ ','.join( var.json.get("my_email_list", []) ) }}",
"email_on_failure": True,
...
},
...
)
Unfortunately, looks like the email argument doesn't support templating and it simply gets passed to the email back-end as-is without rendering, so my approach isn't working.
Could anybody suggest a decent workaround for my particular case, please? I don't really want to hard-code the list of email addresses in the source code, because storing them in an Airflow variable gives much more flexibility.
Here is my a bit hacky solution since neither email is templated in BaseOperator nor there is a way to tweak template_fields at the task level (by a task I mean a configured instance of an operator), and I don't really want to define dummy subclasses for each and every built-in operator just to add email to template_fields, e.g.:
from airflow.operators.python import PythonOperator
class MyPythonOperator(PythonOperator):
template_fields = PythonOperator.template_fields + ("email",) # this sucks!
So I've decided to stick to the following monkey-patching-like approach to dynamically add custom template fields to all the operators visible/reachable in the current module's scope (the function has to be defined somewhere in a shared/common module and then imported and called on the module level of each DAG):
from airflow.models import BaseOperator
def extend_operator_template_fields_with(
extra_template_fields,
base_operator_class=BaseOperator,
) -> None:
for operator_class in base_operator_class.__subclasses__():
# Use a dict (w/o values) instead of a set to keep the original order of template fields for the operator class.
template_fields_dict = dict.fromkeys(operator_class.template_fields)
template_fields_dict.update(dict.fromkeys(extra_template_fields))
operator_class.template_fields = tuple(template_fields_dict.keys())
extend_operator_template_fields_with(extra_template_fields, base_operator_class=operator_class)
P.S. I'm still open to a more elegant solution, just haven't been able to find a better one yet (I'm using Airflow 2.2.5).

Mocking elasticsearch-py calls

I'm writing a CLI to interact with elasticsearch using the elasticsearch-py library. I'm trying to mock elasticsearch-py functions in order to test my functions without calling my real cluster.
I read this question and this one but I still don't understand.
main.py
Escli inherits from cliff's App class
class Escli(App):
_es = elasticsearch5.Elasticsearch()
settings.py
from escli.main import Escli
class Settings:
def get(self, sections):
raise NotImplementedError()
class ClusterSettings(Settings):
def get(self, setting, persistency='transient'):
settings = Escli._es.cluster\
.get_settings(include_defaults=True, flat_settings=True)\
.get(persistency)\
.get(setting)
return settings
settings_test.py
import escli.settings
class TestClusterSettings(TestCase):
def setUp(self):
self.patcher = patch('elasticsearch5.Elasticsearch')
self.MockClass = self.patcher.start()
def test_get(self):
# Note this is an empty dict to show my point
# it will contain childs dict to allow my .get(persistency).get(setting)
self.MockClass.return_value.cluster.get_settings.return_value = {}
cluster_settings = escli.settings.ClusterSettings()
ret = cluster_settings.get('cluster.routing.allocation.node_concurrent_recoveries', persistency='transient')
# ret should contain a subset of my dict defined above
I want to have Escli._es.cluster.get_settings() to return what I want (a dict object) in order to not make the real HTTP call, but it keeps doing it.
What I know:
In order to mock an instance method I have to do something like
MagicMockObject.return_value.InstanceMethodName.return_value = ...
I cannot patch Escli._es.cluster.get_settings because Python tries to import Escli as module, which cannot work. So I'm patching the whole lib.
I desperately tried to put some return_value everywhere but I cannot understand why I can't mock that thing properly.
You should be mocking with respect to where you are testing. Based on the example provided, this means that the Escli class you are using in the settings.py module needs to be mocked with respect to settings.py. So, more practically, your patch call would look like this inside setUp instead:
self.patcher = patch('escli.settings.Escli')
With this, you are now mocking what you want in the right place based on how your tests are running.
Furthermore, to add more robustness to your testing, you might want to consider speccing for the Elasticsearch instance you are creating in order to validate that you are in fact calling valid methods that correlate to Elasticsearch. With that in mind, you can do something like this, instead:
self.patcher = patch('escli.settings.Escli', Mock(Elasticsearch))
To read a bit more about what exactly is meant by spec, check the patch section in the documentation.
As a final note, if you are interested in exploring the great world of pytest, there is a pytest-elasticsearch plugin created to assist with this.

Can't call a decorator within the imported sub-class of a cherrpy application (site tree)

I am using cherrypy as a web server, and I want to check a user's logged-in status before returning the page. This works on methods in the main Application class (in site.py) but gives an error when I call the same decorated function on method in a class that is one layer deeper in the webpage tree (in a separate file).
validate_user() is the function used as a decorator. It either passes a user to the page or sends them to a 401 restricted page, as a cherrypy.Tool, like this:
from user import validate_user
cherrypy.tools.validate_user = cherrypy.Tool('before_handler', validate_user)
I attach different sections of the site to the main site.py file's Application class by assigning instances of the sub-classes as variables accordingly:
from user import UserAuthentication
class Root:
user = UserAuthentication() # maps user/login, user/register, user/logout, etc
admin = Admin()
api = Api()
#cherrypy.expose
#cherrypy.tools.validate_user()
def how_to(self, **kw):
from other_stuff import how_to_page
return how_to_page(kw)
This, however, does not work when I try to use the validate_user() inside the Admin or Api or Analysis sections. These are in separate files.
import cherrypy
class Analyze:
#cherrypy.expose
#cherrypy.tools.validate_user() #### THIS LINE GIVES ERROR ####
def explore(self, *args, **kw): # #addkw(fetch=['uid'])
import explore
kw['uid'] = cherrypy.session.get('uid',-1)
return explore.explorer(args, kw)
The error is that cherrypy.tools doesn't have a validate_user function or method. But other things I assign in site.py do appear in cherrypy here. What's the reason why I can't use this tool in a separate file that is part of my overall site map?
If this is relevant, the validate_user() function simply looks at the cherrypy.request.cookie, finds the 'session_token' value, and compares it to our database and passes it along if the ID matches.
Sorry I don't know if the Analyze() and Api() and User() pages are subclasses, or nested classes, or extended methods, or what. So I can't give this a precise title. Do I need to pass in the parent class to them somehow?
The issue here is that Python processes everything except the function/method bodies during import. So in site.py, when you import user (or from user import <anything>), that causes all of the user module to be processed before the Python interpreter has gotten to the definition of the validate_user tool, including the decorator, which is attempting to access that tool by value (rather than by a reference).
CherryPy has another mechanism for decorating functions with config that will enable tools on those handlers. Instead of #cherrypy.tools.validate_user, use:
#cherrypy.config(**{"tools.validate_user.on": True})
This decorator works because instead of needing to access validate_user from cherrypy.tools to install itself on the handler, it instead configures CherryPy to install that tool on the handler later, when the handler is invoked.
If that tool is needed for all methods on that class, you can use that config decorator on the class itself.
You could alternatively, enable that tool for given endpoints in the server config, as mentioned in the other question.

True way to use setting in python project [duplicate]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
The community reviewed whether to reopen this question last year and left it closed:
Original close reason(s) were not resolved
Improve this question
In my endless quest in over-complicating simple stuff, I am researching the most 'Pythonic' way to provide global configuration variables inside the typical 'config.py' found in Python egg packages.
The traditional way (aah, good ol' #define!) is as follows:
MYSQL_PORT = 3306
MYSQL_DATABASE = 'mydb'
MYSQL_DATABASE_TABLES = ['tb_users', 'tb_groups']
Therefore global variables are imported in one of the following ways:
from config import *
dbname = MYSQL_DATABASE
for table in MYSQL_DATABASE_TABLES:
print table
or:
import config
dbname = config.MYSQL_DATABASE
assert(isinstance(config.MYSQL_PORT, int))
It makes sense, but sometimes can be a little messy, especially when you're trying to remember the names of certain variables. Besides, providing a 'configuration' object, with variables as attributes, might be more flexible. So, taking a lead from bpython config.py file, I came up with:
class Struct(object):
def __init__(self, *args):
self.__header__ = str(args[0]) if args else None
def __repr__(self):
if self.__header__ is None:
return super(Struct, self).__repr__()
return self.__header__
def next(self):
""" Fake iteration functionality.
"""
raise StopIteration
def __iter__(self):
""" Fake iteration functionality.
We skip magic attribues and Structs, and return the rest.
"""
ks = self.__dict__.keys()
for k in ks:
if not k.startswith('__') and not isinstance(k, Struct):
yield getattr(self, k)
def __len__(self):
""" Don't count magic attributes or Structs.
"""
ks = self.__dict__.keys()
return len([k for k in ks if not k.startswith('__')\
and not isinstance(k, Struct)])
and a 'config.py' that imports the class and reads as follows:
from _config import Struct as Section
mysql = Section("MySQL specific configuration")
mysql.user = 'root'
mysql.pass = 'secret'
mysql.host = 'localhost'
mysql.port = 3306
mysql.database = 'mydb'
mysql.tables = Section("Tables for 'mydb'")
mysql.tables.users = 'tb_users'
mysql.tables.groups = 'tb_groups'
and is used in this way:
from sqlalchemy import MetaData, Table
import config as CONFIG
assert(isinstance(CONFIG.mysql.port, int))
mdata = MetaData(
"mysql://%s:%s#%s:%d/%s" % (
CONFIG.mysql.user,
CONFIG.mysql.pass,
CONFIG.mysql.host,
CONFIG.mysql.port,
CONFIG.mysql.database,
)
)
tables = []
for name in CONFIG.mysql.tables:
tables.append(Table(name, mdata, autoload=True))
Which seems a more readable, expressive and flexible way of storing and fetching global variables inside a package.
Lamest idea ever? What is the best practice for coping with these situations? What is your way of storing and fetching global names and variables inside your package?
How about just using the built-in types like this:
config = {
"mysql": {
"user": "root",
"pass": "secret",
"tables": {
"users": "tb_users"
}
# etc
}
}
You'd access the values as follows:
config["mysql"]["tables"]["users"]
If you are willing to sacrifice the potential to compute expressions inside your config tree, you could use YAML and end up with a more readable config file like this:
mysql:
- user: root
- pass: secret
- tables:
- users: tb_users
and use a library like PyYAML to conventiently parse and access the config file
I like this solution for small applications:
class App:
__conf = {
"username": "",
"password": "",
"MYSQL_PORT": 3306,
"MYSQL_DATABASE": 'mydb',
"MYSQL_DATABASE_TABLES": ['tb_users', 'tb_groups']
}
__setters = ["username", "password"]
#staticmethod
def config(name):
return App.__conf[name]
#staticmethod
def set(name, value):
if name in App.__setters:
App.__conf[name] = value
else:
raise NameError("Name not accepted in set() method")
And then usage is:
if __name__ == "__main__":
# from config import App
App.config("MYSQL_PORT") # return 3306
App.set("username", "hi") # set new username value
App.config("username") # return "hi"
App.set("MYSQL_PORT", "abc") # this raises NameError
.. you should like it because:
uses class variables (no object to pass around/ no singleton required),
uses encapsulated built-in types and looks like (is) a method call on App,
has control over individual config immutability, mutable globals are the worst kind of globals.
promotes conventional and well named access / readability in your source code
is a simple class but enforces structured access, an alternative is to use #property, but that requires more variable handling code per item and is object-based.
requires minimal changes to add new config items and set its mutability.
--Edit--:
For large applications, storing values in a YAML (i.e. properties) file and reading that in as immutable data is a better approach (i.e. blubb/ohaal's answer).
For small applications, this solution above is simpler.
How about using classes?
# config.py
class MYSQL:
PORT = 3306
DATABASE = 'mydb'
DATABASE_TABLES = ['tb_users', 'tb_groups']
# main.py
from config import MYSQL
print(MYSQL.PORT) # 3306
Let's be honest, we should probably consider using a Python Software Foundation maintained library:
https://docs.python.org/3/library/configparser.html
Config example: (ini format, but JSON available)
[DEFAULT]
ServerAliveInterval = 45
Compression = yes
CompressionLevel = 9
ForwardX11 = yes
[bitbucket.org]
User = hg
[topsecret.server.com]
Port = 50022
ForwardX11 = no
Code example:
>>> import configparser
>>> config = configparser.ConfigParser()
>>> config.read('example.ini')
>>> config['DEFAULT']['Compression']
'yes'
>>> config['DEFAULT'].getboolean('MyCompression', fallback=True) # get_or_else
Making it globally-accessible:
import configpaser
class App:
__conf = None
#staticmethod
def config():
if App.__conf is None: # Read only once, lazy.
App.__conf = configparser.ConfigParser()
App.__conf.read('example.ini')
return App.__conf
if __name__ == '__main__':
App.config()['DEFAULT']['MYSQL_PORT']
# or, better:
App.config().get(section='DEFAULT', option='MYSQL_PORT', fallback=3306)
....
Downsides:
Uncontrolled global mutable state.
A small variation on Husky's idea that I use. Make a file called 'globals' (or whatever you like) and then define multiple classes in it, as such:
#globals.py
class dbinfo : # for database globals
username = 'abcd'
password = 'xyz'
class runtime :
debug = False
output = 'stdio'
Then, if you have two code files c1.py and c2.py, both can have at the top
import globals as gl
Now all code can access and set values, as such:
gl.runtime.debug = False
print(gl.dbinfo.username)
People forget classes exist, even if no object is ever instantiated that is a member of that class. And variables in a class that aren't preceded by 'self.' are shared across all instances of the class, even if there are none. Once 'debug' is changed by any code, all other code sees the change.
By importing it as gl, you can have multiple such files and variables that lets you access and set values across code files, functions, etc., but with no danger of namespace collision.
This lacks some of the clever error checking of other approaches, but is simple and easy to follow.
Similar to blubb's answer. I suggest building them with lambda functions to reduce code. Like this:
User = lambda passwd, hair, name: {'password':passwd, 'hair':hair, 'name':name}
#Col Username Password Hair Color Real Name
config = {'st3v3' : User('password', 'blonde', 'Steve Booker'),
'blubb' : User('12345678', 'black', 'Bubb Ohaal'),
'suprM' : User('kryptonite', 'black', 'Clark Kent'),
#...
}
#...
config['st3v3']['password'] #> password
config['blubb']['hair'] #> black
This does smell like you may want to make a class, though.
Or, as MarkM noted, you could use namedtuple
from collections import namedtuple
#...
User = namedtuple('User', ['password', 'hair', 'name']}
#Col Username Password Hair Color Real Name
config = {'st3v3' : User('password', 'blonde', 'Steve Booker'),
'blubb' : User('12345678', 'black', 'Bubb Ohaal'),
'suprM' : User('kryptonite', 'black', 'Clark Kent'),
#...
}
#...
config['st3v3'].password #> passwd
config['blubb'].hair #> black
I did that once. Ultimately I found my simplified basicconfig.py adequate for my needs. You can pass in a namespace with other objects for it to reference if you need to. You can also pass in additional defaults from your code. It also maps attribute and mapping style syntax to the same configuration object.
please check out the IPython configuration system, implemented via traitlets for the type enforcement you are doing manually.
Cut and pasted here to comply with SO guidelines for not just dropping links as the content of links changes over time.
traitlets documentation
Here are the main requirements we wanted our configuration system to have:
Support for hierarchical configuration information.
Full integration with command line option parsers. Often, you want to read a configuration file, but then override some of the values with command line options. Our configuration system automates this process and allows each command line option to be linked to a particular attribute in the configuration hierarchy that it will override.
Configuration files that are themselves valid Python code. This accomplishes many things. First, it becomes possible to put logic in your configuration files that sets attributes based on your operating system, network setup, Python version, etc. Second, Python has a super simple syntax for accessing hierarchical data structures, namely regular attribute access (Foo.Bar.Bam.name). Third, using Python makes it easy for users to import configuration attributes from one configuration file to another.
Fourth, even though Python is dynamically typed, it does have types that can be checked at runtime. Thus, a 1 in a config file is the integer ‘1’, while a '1' is a string.
A fully automated method for getting the configuration information to the classes that need it at runtime. Writing code that walks a configuration hierarchy to extract a particular attribute is painful. When you have complex configuration information with hundreds of attributes, this makes you want to cry.
Type checking and validation that doesn’t require the entire configuration hierarchy to be specified statically before runtime. Python is a very dynamic language and you don’t always know everything that needs to be configured when a program starts.
To acheive this they basically define 3 object classes and their relations to each other:
1) Configuration - basically a ChainMap / basic dict with some enhancements for merging.
2) Configurable - base class to subclass all things you'd wish to configure.
3) Application - object that is instantiated to perform a specific application function, or your main application for single purpose software.
In their words:
Application: Application
An application is a process that does a specific job. The most obvious application is the ipython command line program. Each application reads one or more configuration files and a single set of command line options and then produces a master configuration object for the application. This configuration object is then passed to the configurable objects that the application creates. These configurable objects implement the actual logic of the application and know how to configure themselves given the configuration object.
Applications always have a log attribute that is a configured Logger. This allows centralized logging configuration per-application.
Configurable: Configurable
A configurable is a regular Python class that serves as a base class for all main classes in an application. The Configurable base class is lightweight and only does one things.
This Configurable is a subclass of HasTraits that knows how to configure itself. Class level traits with the metadata config=True become values that can be configured from the command line and configuration files.
Developers create Configurable subclasses that implement all of the logic in the application. Each of these subclasses has its own configuration information that controls how instances are created.

How to get Registry().settings during Pyramid app startup time?

I am used to develop web applications on Django and gunicorn.
In case of Django, any application modules in a Django application can get deployment settings through django.conf.settings. The "settings.py" is written in Python, so that any arbitrary settings and pre-processing can be defined dynamically.
In case of gunicorn, it has three configuration places in order of precedence, and one settings registry class instance combines those.(But usually these settings are used only for gunicorn not application.)
Command line parameters.
Configuration file. (like Django, written in
Python which can have any arbitrary
settings dynamically.)
Paster application settings.
In case of Pyramid, according to Pyramid documentation, deployment settings may be usually put into pyramid.registry.Registry().settings. But it seems to be accessed only when a pyramid.router.Router() instances exists.
That is pyramid.threadlocal.get_current_registry().settings returns None, during the startup process in an application "main.py".
For example, I usually define some business logic in SQLAlchemy model modules, which requires deployment settings as follows.
myapp/models.py
from sqlalchemy import Table, Column, Types
from sqlalchemy.orm import mapper
from pyramid.threadlocal import get_current_registry
from myapp.db import session, metadata
settings = get_current_registry().settings
mytable = Table('mytable', metadata,
Column('id', Types.INTEGER, primary_key=True,)
(other columns)...
)
class MyModel(object):
query = session.query_property()
external_api_endpoint = settings['external_api_uri']
timezone = settings['timezone']
def get_api_result(self):
(interact with external api ...)
mapper(MyModel, mytable)
But, "settings['external_api_endpoint']" raises a TypeError exception because the "settings" is None.
I thought two solutions.
Define a callable which accepts "config" argument in "models.py" and "main.py" calls it with a
Configurator() instance.
myapp/models.py
from sqlalchemy import Table, Column, Types
from sqlalchemy.orm import mapper
from myapp.db import session, metadata
_g = globals()
def initialize(config):
settings = config.get_settings()
mytable = Table('mytable', metadata,
Column('id', Types.INTEGER, rimary_key = True,)
(other columns ...)
)
class MyModel(object):
query = session.query_property()
external_api_endpoint = settings['external_api_endpoint']
def get_api_result(self):
(interact with external api)...
mapper(MyModel, mytable)
_g['MyModel'] = MyModel
_g['mytable'] = mytable
Or, put an empty module "app/settings.py", and put setting into it later.
myapp/__init__.py
from pyramid.config import Configurator
from .resources import RootResource
def main(global_config, **settings):
config = Configurator(
settings = settings,
root_factory = RootResource,
)
import myapp.settings
myapp.setting.settings = config.get_settings()
(other configurations ...)
return config.make_wsgi_app()
Both and other solutions meet the requirements, but I feel troublesome. What I want is the followings.
development.ini
defines rough settings because development.ini can have only string type constants.
[app:myapp]
use = egg:myapp
env = dev0
api_signature = xxxxxx
myapp/settings.py
defines detail settings based on development.ini, beacause any arbitrary variables(types) can be set.
import datetime, urllib
from pytz import timezone
from pyramid.threadlocal import get_current_registry
pyramid_settings = get_current_registry().settings
if pyramid_settings['env'] == 'production':
api_endpoint_uri = 'http://api.external.com/?{0}'
timezone = timezone('US/Eastern')
elif pyramid_settings['env'] == 'dev0':
api_endpoint_uri = 'http://sandbox0.external.com/?{0}'
timezone = timezone('Australia/Sydney')
elif pyramid_settings['env'] == 'dev1':
api_endpoint_uri = 'http://sandbox1.external.com/?{0}'
timezone = timezone('JP/Tokyo')
api_endpoint_uri = api_endpoint_uri.format(urllib.urlencode({'signature':pyramid_settings['api_signature']}))
Then, other modules can get arbitrary deployment settings through "import myapp.settings".
Or, if Registry().settings is preferable than "settings.py", **settings kwargs and "settings.py" may be combined and registered into Registry().settings during "main.py" startup process.
Anyway, how to get the settings dictionay during startup time ? Or, Pyramid gently forces us to put every code which requires deployment settings in "views" callables which can get settings dictionary anytime through request.registry.settings ?
EDIT
Thanks, Michael and Chris.
I at last understand why Pyramid uses threadlocal variables(registry and request), in particular registry object for more than one Pyramid applications.
In my opinion, however, deployment settings usually affect business logics that may define application-specific somethings. Those logics are usually put in one or more Python modules that may be other than "app/init.py" or "app/views.py" that can easily get access to Config() or Registry(). Those Python modules are normally "global" at Python process level.
That is, even when more than one Pyramid applications coexist, despite their own threadlocal variables, they have to share those "global" Python modules that may contain applicatin-specific somethings at Python process level.
Of cause, every those modules can have "initialize()" callalbe which is called with a Configurator() by the application "main" callable, or passing Registory() or Request() object through so long series of function calls can meet usual requirements. But, I guess Pyramid beginers (like me) or developers who has "large application or so many settings" may feel troublesome, although that is Pyramid design.
So, I think, Registry().settings should have only real "thread-local" variables, and should not have normal business-logic settings. Responsibility for segregation of multiple application-specific module, classes, callables variables etc. should be taken by developer.
As of now, from my viewpoint, I will take Chris's answer. Or in "main" callable, do "execfile('settings.py', settings, settings)" and put it in some "global" space.
Another option, if you enjoy global configuration via Python, create a settings.py file. If it needs values from the ini file, parse the ini file and grab them out (at module scope, so it runs at import time):
from paste.deploy.loadwsgi import appconfig
config = appconfig('config:development.ini', 'myapp', relative_to='.')
if config['env'] == 'production':
api_endpoint_uri = 'http://api.external.com/?{0}'
timezone = timezone('US/Eastern')
# .. and so on ...
'config:development.ini' is the name of the ini file (prefixed with 'config:'). 'myapp' is the section name in the config file representing your app (e.g. [app:myapp]). "relative_to" is the directory name in which the config file can be found.
The pattern that I use is to pass the Configurator to modules that need to be initialized. Pyramid doesn't use any global variables because a design goal is to be able to run multiple instances of Pyramid in the same process. The threadlocals are global, but they are local to the current request, so different Pyramid apps can push to them at the same time from different threads.
With this in mind, if you do want a global settings dictionary you'll have to take care of that yourself. You could even push the registry onto the threadlocal manager yourself by calling config.begin().
I think the major thing to take away here is that you shouldn't be calling get_current_registry() at the module level, because at the time of import you aren't really guaranteed that the threadlocals are initialized, however in your init_model() function if you call get_current_registry(), you'd be fine if you previously called config.begin().
Sorry this is a little convoluted, but it's a common question and the best answer is: pass the configurator to your submodules that need it and allow them to add stuff to the registry/settings objects for use later.
Pyramid uses static configration by PasteDeploy, unlike Django.
Your [EDIT] part is a nice solution, I think Pyramid community should consider such usage.

Categories