Preventing use of global variables in Flask Blueprints to improve unit tests

Preventing use of global variables in Flask Blueprints to improve unit tests - python

Problem
I'm seeing examples that use global variables in Blueprints to store objects used by the Blueprint. However, this causes problems in unit-testing in case multiple instances of an application are created.
To illustrate, below is an example where parsed files are cached, based on a setting during blueprint initialization (Note: I know there are many issues with the below code, it's just an example):
from pathlib import Path
import json
from flask import Blueprint, jsonify
from whatevermodule import JSONFileCache
bp = Blueprint('cached_file_loader', __name__, url_prefix='/files')
json_cache = JSONFileCache()
#bp.record
def init(setup_state):
if not setup_state.options['use_caching']:
json_cache.disable()
#bp.route('/<path:path>')
def view_json(path):
localpath = Path('/tmp', path.lstrip('/'))
data = json_cache.get(localpath)
if data is None:
with open(localpath, 'r') as jsonfile:
data = json.load(jsonfile)
json_cache.set(localpath, data)
return jsonify(data)
The problem is, in this example a global variable 'json_cache' is used. This means if I create multiple instances of an app with this blueprint, using different caching settings, these will interfere, causing tests to fail.
What I've tried/considered
Keep using global variables, and just use one application per test file. This actually what I'm using currently. However,
since pytest collects all tests in one process, I have to do find ./tests -iname "test_*.py" -print0 | xargs -0 -n1 pytest
to make sure everything runs .
Storing the objects (in this case the json_cache) on the Blueprint instance. This is however not possible, since the 'bp' variable is also a global variable, that will be reused...
Inside the 'init' function, storing the objects inside setup_state.app.config. This should
work, since this variable is initialized per application. The
docs for configuration handling actually
say:
That way you can create multiple instances of your application with different configurations
attached which makes unit testing a lot easier.
Although option (3) should work, it looks very 'hacky' to me to store functional objects inside a dictionary marked as 'config', where I would only expect configuration variables.
Question
Is there a better way, that looks less 'hacky' that option (3), to store objects used inside blueprints, at the app level during intialization?

Related

Is it bad practice to modify attributes of one module from another module?

I want to define a bunch of config variables that can be imported in all the modules in my project. The values of those variables will be constant during runtime but are not known before runtime; they depend on the input. Usually I'd define a dict in my top module which would be passed to all functions and classes from other modules; however, I was thinking it may be cleaner to simply create a blank config.py module which would be dynamically filled with config variables by the top module:
# top.py
import config
config.x = x
# config.py
x = None
# other.py
import config
print(config.x)
I like this approach because I don't have to save the parameters as attributes of classes in my other modules; which makes sense to me because parameters do not describe classes themselves.
This works but is it considered bad practice?

The question as such may be disputed. But I would generally say yes, it's "bad practice" because scope and impact of change is really getting blurred. Note the use case you're describing really is not about sharing configuration, but about different parts of the program functions, objects, modules exchanging data and as such it's a bit of a variation on (meta)global variable).
Reading common configuration values could be fine, but changing them along the way... you may lose track of what happened where and also in which order as modules get imported / values get modified. For instance assume the config.py and two modules m1.py:
import config
print(config.x)
config.x=1
and m2.py:
import config
print(config.x)
config.x=2
and a main.py that just does:
import m1
import m2
import config
print(config.x)
or:
import m2
import m1
import config
print(config.x)
The state in which you find config in each module and really any other (incl. main.py here) depends on order in which imports have occurred and who assigned what value when. Even for a program entirely under your control, this may get confusing (and source of mistakes) rather quickly.
For runtime data and passing information between objects and modules (and your example is really that and not configuration that is predefined and shared between modules) I would suggest you look into describing the information perhaps in a custom state (config) object and pass it around through appropriate interface. But really just a function / method argument may be all that is needed. The exact form depends on what exactly you're trying to achieve and what your overall design is.
In your example, other.py behaves differently when called or imported before top.py which may still seem obvious and manageable in a minimal example, but really is not a very sound design. Anyone reading the code (incl. future you) should be able to follow its logic and this IMO breaks its flow.
The most trivial (and procedural) example of what for what you've described and now I hopefully have a better grasp of would be other.py recreating your current behavior:
def do_stuff(value):
print(value) # We did something useful here
if __name__ == "__main__":
do_stuff(None) # Could also use config with defaults
And your top.py presumably being the entry point and orchestrating importing and execution doing:
import other
x = get_the_value()
other.do_stuff(x)
You can of course introduce an interface to configure do_stuff perhaps a dict or a custom class even with default implementation in config.py:
class Params:
def __init__(self, x=None):
self.x = x
and your other.py:
def do_stuff(params=config.Params()):
print(params.x) # We did something useful here
And on your top.py you can use:
params = config.Params(get_the_value())
other.do_stuff(params)
But you could also have any use case specific source of value(s):
class TopParams:
def __init__(self, url):
self.x = get_value_from_url(url)
params = TopParams("https://example.com/value-source")
other.do_stuff(params)
x could even be a property which you retrieve every time you access it... or lazily when needed and then cached... Again, it really then is a matter of what you need to do.

"Is it bad practice to modify attributes of one module from another module?"
that it is considered as bad practice - violation of the law of demeter, which means in fact "talk to friends, not to strangers".
Objects should expose behaviour and functions, but should HIDE the data.
DataStructures should EXPOSE data, but should not have any methods (which are exposed). The law of demeter does not apply to such DataStructures. OOP Purists might cover such DataStructures with setters and getters, but it really adds no value in Python.
there is a lot of literature about that like : https://en.wikipedia.org/wiki/Law_of_Demeter
and of course, a must to read: "Clean Code", by Robert C. Martin (Uncle Bob), check it out on Youtube also.
For procedural programming it is perfectly normal to keep data in a DataStructure which does not have any (exposed) methods.
The procedures in the program work with that data. Consider to use the module attrs, see : https://www.attrs.org/en/stable/ for easy creation of such classes.
my prefered method for keeping config is (here without using attrs):
# conf_xy.py
"""
config is code - so why use damned parsers, textfiles, xml, yaml, toml and all that
if You just can use testable code as config that can deliver the correct types, etc.
as well as hinting in Your favorite IDE ?
Here, for demonstration without using attrs package - usually I use attrs (read the docs)
"""
class ConfXY(object):
def __init__(self) -> None:
self.x: int = 1
self.z: float = get_z_from_input()
...
conf_xy=ConfXY()
# other.py
from conf_xy import conf_xy
...
y = conf_xy.x * 2
...

Django variable handling-why does this work?

I'm pretty new to django and came across something that confuses me in this views.py file I've created. I just played around with it a little and came up with something that works, but I don't get why it does.
The class Draft_Order (which I have in another file) requests the NBA stats page, performs some calculations on the backend, and spits out draft lottery odds (for the new draft). The methods initialize, sim draft, and get standings all do things on the backend (which works perfectly).
Now, my question is that I don't get why I can create an instance "f" of the class DraftOrder outside all of the functions, but yet still be able to reference it within most of my functions as they are getting called from my urls.py file, so it doesnt seem like they should be working at all. Also, for some reason, the update function can only reference "f" if I don't have an assignment to f in the function-e.g. if I add the line
f = temp
Then all of a sudden it gives me an "unboundlocalerror", and says that f is referenced before assignment.
I'd appreciate any help on this. Thanks.
from django.shortcuts import render
from django.http.response import HttpResponse
from simulator.draft_simulator import Draft_Order
from simulator.models import Order
# Create your views here.
f = Draft_Order()
f.initialize()
def index(request):
return HttpResponse('<p>Hello World</p>')
def init(request):
return HttpResponse(f.initalodds.to_html())
def table(request):
f.sim_draft()
return HttpResponse(f.finaltable.to_html())
def update(request):
temp = Draft_Order()
temp.get_standings()
if temp == f:
return HttpResponse('Same!')
else:
return HttpResponse('updated!')

UnboundLocalError happens because presence of an assignment to f inside a function shadows the global f for the whole function. You need to explicitly state that f refers to the global variable:
def update(r):
global f
if f == ...
f = Draft_Order() # new draft order
But really, you shouldn't rely on global values stored in RAM, because in production environment you'll have several processes with probably different fs and you won't be able to control time of life of the said processes. Better to rely on a persistent memory here (DBs, key-value stores, files, etc).

You need to look into python namespaces and scope
But here is how i like to think of it to avoid going crazy(everything in python is an object).
In simple terms python those .py files are modules, when python is running those modules are converted into objects, so you have a urls object, a views object ,etc.
so any variable you define on module level turns into an attribute and any function defined turns into a method.
I believe you do something like this on your url.py
from simulator import views
or
from simulator.views import update
which basically means get the views object which represent the views.py file.
From the views object you are able to access your methods like update.
Your update method is able to access the f because here's an excerpt from python namespaces and scope
the global scope of a function defined in a module is that module’s namespace, no matter from where or by what alias the function is called.
Basically your f is an attribute of the views object meaning any methods within views object can access it.
Reason why it works when on urls.py its because methods can access the attributes of objects the are defined in, so since update method is define inside views its able to access views attributes.
Please read more on python namespaces and scope this a very simplified explanation.

Passing variables between files in a project without init

is in Python any way how to initialize the variable only once and then just import it into the remaining modules?
I have following python project structure:
api
v1
init.py
v2
init.py
init.py
logging.py
logging.py:
from raven import Client
sentry = None
def init_sentry():
global sentry
sentry = 'some_dsn'
api/init.py
from app import logging
logging.init_sentry()
#run flask server (v1,v2)
api/{v1,v2}/init.py
from logging import sentry
try:
1 / 0
except ZeroDivisionError:
sentry.captureException()
In files api/v1/init.py and api/v2/init.py a get a error NoneType on sentry variable. I know I can call init_sentry in all files when I use it, but I'm looking for a better way.
Thanks

First, I think you misspelled init.py, should it be __init__.py.
It is bad programming style to pass data between modules with variable. You should use a class or a function to handle shared data. In such manner you have an API, and it is clear what the variable could be modified by other modules.
But for your question: I would (really not) create a module data.py with a shared = {} dictionary. From other modules, just by importing I can share the data. By looking if a variable, or just a flag moduleA_initialized, you can decide if you need to initialized the module.
As alternative, you can directly write to globals() dictionary. Note: this is worse programming practice, and you should check carefully the names, so that there is no conflicts to any library you may use. gettext write to it, but is pretty special case.

Here is one way to encapsulate the sentry variable and make sure that it is always calling into something, instead of accessing None:
logging.py:
class Sentry(object):
_dsn = None
#classmethod
def _set_dsn(cls, dsn):
cls._dsn = dsn
#classmethod
def __getattr__(cls, item):
return getattr(cls._dsn)
sentry = Sentry
def init_sentry():
Sentry._set_dsn('some_dsn')
Note:
This answer was also correct about the fact that you likely want __init__.py not init.py.

Ansible: Access host/group vars from within custom module

Is there a way how one can access host/group vars from within a custom written module? I would like to avoid to pass all required vars as module parameters.
My module is written in Python and I use the boilerplate. I checked pretty much all available vars but they are not stored anywhere:
def main():
pprint(dir())
pprint(globals())
pprint(locals())
for name in vars().keys():
print(name)
Now my only hope is they are somehow accessible through the undocumented module utils.
I guess it is not possible, since the module runs on the target machine and probably the facts/host/group vars are not transferred along with the module...
Edit: Found the module utils now and it doesn't look promising.

Is there a way how one can access host/group vars from within a custom
written module?
Not built-in.
You will have to pass them yourself one way or the other:
Module args.
Serialize to local file system (with pickle or yaml.dump() or json or ...) and send the file over.
any other innovative ideas you can come up with.
Unfortunately you can't just send over whole host/groupvar files as-it-is because you would have to implement the variable scope/precedence resolution algorithm of ansible which is undefined (it's not the Zen philosophy of ansible to define such petty things :P ).
--edit--
I see they have some precedence defined now.
Ansible does apply variable precedence, and you might have a use for
it. Here is the order of precedence from least to greatest (the last
listed variables override all other variables):
command line values (for example, -u my_user, these are not variables)
role defaults (defined in role/defaults/main.yml) 1
inventory file or script group vars 2
inventory group_vars/all 3
playbook group_vars/all 3
inventory group_vars/* 3
playbook group_vars/* 3
inventory file or script host vars 2
inventory host_vars/* 3
playbook host_vars/* 3
host facts / cached set_facts 4
play vars
play vars_prompt
play vars_files
role vars (defined in role/vars/main.yml)
block vars (only for tasks in block)
task vars (only for the task)
include_vars
set_facts / registered vars
role (and include_role) params
include params
extra vars (for example, -e "user=my_user")(always win precedence)
In general, Ansible gives precedence to variables that were defined
more recently, more actively, and with more explicit scope. Variables
in the defaults folder inside a role are easily overridden. Anything
in the vars directory of the role overrides previous versions of that
variable in the namespace. Host and/or inventory variables override
role defaults, but explicit includes such as the vars directory or an
include_vars task override inventory variables.
Ansible merges different variables set in inventory so that more
specific settings override more generic settings. For example,
ansible_ssh_user specified as a group_var is overridden by
ansible_user specified as a host_var. For details about the precedence
of variables set in inventory, see How variables are merged.
Footnotes
1 Tasks in each role see their own role’s defaults. Tasks defined
outside of a role see the last role’s defaults.
2(1,2) Variables defined in inventory file or provided by dynamic
inventory.
3(1,2,3,4,5,6) Includes vars added by ‘vars plugins’ as well as
host_vars and group_vars which are added by the default vars plugin
shipped with Ansible.
4 When created with set_facts’s cacheable option, variables have the
high precedence in the play, but are the same as a host facts
precedence when they come from the cache.

As per your suggestion in your answer here, I did manage to read host_vars and local play vars through a custom Action Plugin.
I'm posting this answer for completeness sake and to give an explicit example of how one might go about this method, although you gave this idea originally :)
Note - this example is incomplete in terms of a fully functioning plugin. It just shows the how to access variables.
from ansible.template import is_template
from ansible.plugins.action import ActionBase
class ActionModule(ActionBase):
def run(self, tmp=None, task_vars=None):
# some boilerplate ...
# init
result = super(ActionModule, self).run(tmp, task_vars)
# more boilerplate ...
# check the arguments passed to the task, where if missing, return None
self._task.args.get('<TASK ARGUMENT NAME>', None)
# or
# check if the play has vars defined
task_vars['vars']['<ARGUMENT NAME>']
# or
# check if the host vars has something defined
task_vars['hostvars']['<HOST NAME FORM HOSTVARS>']['<ARGUMENT NAME>']
# again boilerplate...
# build arguments to pass to the module
some_module_args = dict(
arg1=arg1,
arg2=arg2
)
# call the module with the above arguments...
In case you have your playbook variables with jinja 2 templates, you can resolve these templates in the plugin as follows:
from ansible.template import is_template
# check if the variable is a template through 'is_template'
if is_template(var, self._templar.environment):
# access the internal `_templar` object to resolve the template
resolved_arg = self._templar.template(var_arg)
Some words of caution:
If you have a variable defined in your playbook as follows
# things ...
#
vars:
- pkcs12_path: '{{ pkcs12_full_path }}'
- pkcs12_pass: '{{ pkcs12_password }}'
The variable pkcs12_path must not match the host_vars name.
For instance, if you had pkcs12_path: '{{ pkcs12_path }}', then resolving the template with the above code will cause a recursive exception... This might be obvious to some, but for me it was surprising that the host_vars variable and the playbook variable must not be with the same name.
You can also access variables through task_vars['<ARG_NAME>'], but I'm not sure where it's reading this from. Also it's less explicit than taking variables from task_vars['vars']['<ARG_NAME>'] or from the hostvars.
PS - in the time of writing this, the example follows the basic structure of what Ansible consider an Action Plugin. In the future, the run method might change its signature...

I think you pretty much hit the nail on the head with your thinking here:
I guess it is not possible, since the module runs on the target machine and probably the facts/host/group vars are not transferred along with the module...
However, having said that, if you really have a need for this then there might be a slightly messy way of doing it. As of Ansible 1.8 you can set up fact caching, which uses redis to cache facts between runs of plays. Since redis is pretty easy to use and has clients for most popular programming languages, you could have your module query the redis server for any facts you need. It's not exactly the cleanest way to do it, but it just might work.

Recursively populating all in init.py

I'm using the following code to populate __all__ in my module's __init__.py and I was wandering if there was a more efficient way. Any ideas?
import fnmatch
import os
__all__ = []
for root, dirnames, filenames in os.walk(os.path.dirname(__file__)):
root = root[os.path.dirname(__file__).__len__():]
for filename in fnmatch.filter(filenames, "*.py"):
__all__.append(os.path.join(root, filename[:-3]))

You probably shouldn't be doing this: The default behaviour of import is quite flexible. If you don't want a module (or any other variable) to be automatically exported, give it a name that starts with _ and python won't export it. That's the standard python way, and reinventing the wheel is considered unpythonic. Also, don't forget that other things besides modules may need exporting; once you set __all__, you'll need to find and export them as well.
Still, you ask how to best generate a list of your exportable modules. Since you can't export what's not present, I'd just check what modules of your own are known to your main module:
basedir = os.path.dirname(__file__)
for m in sys.modules:
if m in locals() and not m.startswith('_'): # Only export regular names
mod = locals()[m]
if '__file__' in mod.__dict__ and mod.__file__.startswith(basedir):
print m
sys.modules includes the names of every module that python has loaded, including many that have not been exported to your main module-- so we check if they're in locals().
This is faster than scanning your filesystem, and more robust than assuming that every .py file in your directory tree will somehow end up as a top-level submodule. Naturally you should run this code near the end of your __init__.py, when everything has been loaded.

I work with a few complex packages that have sub-packages and sub-modules. I like to control this on a module by module basis. I use a simple package called auto-all which makes it easy (full disclosure - I am the author).
https://pypi.org/project/auto-all/
Here's an example:
from auto_all import start_all, end_all
# Define some internal stuff
start_all(globals())
# Define some external stuff
end_all(globals())
The reason I use this approach is mainly because of imports. As mentioned by alexis, you can implicitly make things private by prefixing object names with an underscore, however this can get messy or just impractical for imported objects. Consider the following code:
from pyspark.sql.session import SparkSession
If this appears in your module then you will be implicitly making SparkSession available to be accessed from outside the module. The alternative is to prefix all imported items with underscores, for example:
from pyspark.sql.session import SparkSession as _SparkSession
This also isn't ideal, so manually managing __all__ is the only way (I'm aware of) to manage what you make externally available.
You can easily do this by explicitly setting the contents of the __all__ variable (which is the pythonic way), but this can become tedious when managing a large number of objects, and can also lead to issues if a developer adds a new object and doesn't expose it by adding to the __all__ variable. This type of thing can slip through code reviews. Using simple helper functions to manage the variable contents makes this much easier.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.