Cache Django Templates in multi-tenant site

Cache Django Templates in multi-tenant site - python

I have a multi-tenant django project, where there is a single django instance serving multiple tenants from separate databases. It is fully operational in production.
I have template caching enabled, which was fine, until I implemented a per-tenant template loader, which essentially allows tenantXYZ to define a customised copy of /some_app/templates/some_template.html in a different directory. So if the url starts with tenantXYZ, the page loads the template from that custom directory instead of the base version of /some_app/templates/some_template.html, allowing very powerful per-tenant customisation.
The problem is template caching (which we only enable on production) returns a cached copy of /some_app/templates/some_template.html, therefore overriding the loader's functionality.
My plan is to subclass Django's django.template.loaders.cached.Loader to make it use a separate cache per tenant (which is what multi-tenant tools do with the normal cache anyway, see django-db-multitenant section on cache KEY_FUNCTION)
https://github.com/django/django/blob/master/django/template/loaders/cached.py
But I am struggling to understand exactly how the cached.Loader works. It seems that its 'cache' is just a dictionary (in which case I can change it to use a dictionary per tenant to keep the caches separate) and infers that there is only one instance of the loader running.
Are those assumptions correct, or does it actually use the CACHE settings (e.g. in a base class or elsewhere I may not be aware of)
Is there anything else I am missing which I need to take into account?
Here is Django's cached.Loader source code:
"""
Wrapper class that takes a list of template loaders as an argument and attempts
to load templates from them in order, caching the result.
"""
import hashlib
from django.template import TemplateDoesNotExist
from django.template.backends.django import copy_exception
from .base import Loader as BaseLoader
class Loader(BaseLoader):
def __init__(self, engine, loaders):
self.template_cache = {}
self.get_template_cache = {}
self.loaders = engine.get_template_loaders(loaders)
super().__init__(engine)
def get_contents(self, origin):
return origin.loader.get_contents(origin)
def get_template(self, template_name, skip=None):
"""
Perform the caching that gives this loader its name. Often many of the
templates attempted will be missing, so memory use is of concern here.
To keep it in check, caching behavior is a little complicated when a
template is not found. See ticket #26306 for more details.
With template debugging disabled, cache the TemplateDoesNotExist class
for every missing template and raise a new instance of it after
fetching it from the cache.
With template debugging enabled, a unique TemplateDoesNotExist object
is cached for each missing template to preserve debug data. When
raising an exception, Python sets __traceback__, __context__, and
__cause__ attributes on it. Those attributes can contain references to
all sorts of objects up the call chain and caching them creates a
memory leak. Thus, unraised copies of the exceptions are cached and
copies of those copies are raised after they're fetched from the cache.
"""
key = self.cache_key(template_name, skip)
cached = self.get_template_cache.get(key)
if cached:
if isinstance(cached, type) and issubclass(cached, TemplateDoesNotExist):
raise cached(template_name)
elif isinstance(cached, TemplateDoesNotExist):
raise copy_exception(cached)
return cached
try:
template = super().get_template(template_name, skip)
except TemplateDoesNotExist as e:
self.get_template_cache[key] = copy_exception(e) if self.engine.debug else TemplateDoesNotExist
raise
else:
self.get_template_cache[key] = template
return template
def get_template_sources(self, template_name):
for loader in self.loaders:
yield from loader.get_template_sources(template_name)
def cache_key(self, template_name, skip=None):
"""
Generate a cache key for the template name, dirs, and skip.
If skip is provided, only origins that match template_name are included
in the cache key. This ensures each template is only parsed and cached
once if contained in different extend chains like:
x -> a -> a
y -> a -> a
z -> a -> a
"""
dirs_prefix = ''
skip_prefix = ''
if skip:
matching = [origin.name for origin in skip if origin.template_name == template_name]
if matching:
skip_prefix = self.generate_hash(matching)
return '-'.join(s for s in (str(template_name), skip_prefix, dirs_prefix) if s)
def generate_hash(self, values):
return hashlib.sha1('|'.join(values).encode()).hexdigest()
def reset(self):
"Empty the template cache."
self.template_cache.clear()
self.get_template_cache.clear()

Related

Wagtail: How to override default ImageEmbedHandler?

I've been having some trouble implementing Wagtail CMS on my own Django backend. I'm attempting to use the 'headless' version and render content on my own SPA. As a result, I need to create my own EmbedHandlers so that I can generate URL's to documents and images to a private S3 bucket. Unfortunately, though I've registered my own PrivateS3ImageEmbedHandler, Wagtail is still using the default ImageEmbedHandler to convert the html-like bodies to html. Is there a way for me to set it so that Wagtail uses my custom EmbedHandler over the built in default?
Here's my code:
from wagtail.core import blocks, hooks
from messaging.utils import create_presigned_url
class PrivateS3ImageEmbedHandler(EmbedHandler):
identifier = "image"
#staticmethod
def get_model():
return get_user_model()
#classmethod
def get_instance(cls, attrs):
model = cls.get_instance(attrs)
print(model)
return model.objects.get(id=attrs['id'])
#classmethod
def expand_db_attributes(cls, attrs):
image = cls.get_instance(attrs)
print(image)
presigned_url = create_presigned_url('empirehealth-mso', image.file)
print(presigned_url)
return f'<img src="{presigned_url}" alt="it works!"/>'
#hooks.register('register_rich_text_features')
def register_private_images(features):
features.register_embed_type(PrivateS3ImageEmbedHandler)

You need to ensure that your #hooks.register('register_rich_text_features') call happens after the one in the wagtail.images app; this can be done by either putting your app after wagtail.images in INSTALLED_APPS, or by passing an order argument greater than 0:
#hooks.register('register_rich_text_features', order=10)

reusable apps and access control

Let's say I've created a (hopefully) reusable app, fooapp:
urls.py
urls('^(?P<userid>\d+)/$', views.show_foo),
and fooapp's views.py:
def show_foo(request, userid):
usr = shortcuts.get_object_or_404(User, pk=userid)
... display a users' foo ...
return render_to_response(...)
Since it's a reusable app, it doesn't specify any access control (e.g. #login_required).
In the site/project urls.py, the app is included:
urls('^foo/', include('fooapp.urls')),
How/where can I specify that in this site only staff members should be granted access to see a user's foo?
How about if, in addition to staff members, users should be able to view their own foo (login_required + request.user.id == userid)?
I didn't find any obvious parameters to include..
Note: this has to do with access control, not permissions, i.e. require_staff checks User.is_staff, login_required checks if request.user is logged in, and user-viewing-their-own-page is described above. This question is in regards to how a site can specify access control for a reusable app.

Well, I've found a way that works by iterating over the patterns returned by Django's include:
from django.core.urlresolvers import RegexURLPattern, RegexURLResolver
def urlpatterns_iterator(patterns):
"""Recursively iterate through `pattern`s.
"""
_patterns = patterns[:] # create a copy
while _patterns:
cur = _patterns.pop()
if isinstance(cur, RegexURLPattern):
yield cur
elif isinstance(cur, RegexURLResolver):
_patterns += cur.url_patterns
else:
raise ValueError("I don't know how to handle %r." % cur)
def decorate(fn, (urlconf_module, app_name, namespace)):
"""Iterate through all the urls reachable from the call to include and
wrap the views in `fn` (which should most likely be a decorator).
(the second argument is the return value of Django's `include`).
"""
# if the include has a list of patterns, ie.: url(<regex>, include([ url(..), url(..) ]))
# then urlconf_module doesn't have 'urlpatterns' (since it's already a list).
patterns = getattr(urlconf_module, 'urlpatterns', urlconf_module)
for pattern in urlpatterns_iterator(patterns):
# the .callback property will set ._callback potentially from a string representing the path to the view.
if pattern.callback:
# the .callback property doesn't have a setter, so access ._callback directly
pattern._callback = fn(pattern._callback)
return urlconf_module, app_name, namespace
this is then used in the site/project's urls.py, so instead of:
urls('^foo/', include('fooapp.urls')),
one would do:
from django.contrib.admin.views.decorators import staff_member_required as _staff_reqd
def staff_member_required(patterns): # make an include decorator with a familiar name
return decorate(_staff_reqd, patterns)
...
urls('^foo/', staff_member_required(include('fooapp.urls'))),

How to handle complex URL in a elegant way?

I'm writing a admin website which control several websites with same program and database schema but different content. The URL I designed like this:
http://example.com/site A list of all sites which under control
http://example.com/site/{id} A brief overview of select site with ID id
http://example.com/site/{id}/user User list of target site
http://example.com/site/{id}/item A list of items sold on target site
http://example.com/site/{id}/item/{iid} Item detailed information
# ...... something similar
As you can see, nearly all URL are need the site_id. And in almost all views, I have to do some common jobs like query Site model against database with the site_id. Also, I have to pass site_id whenever I invoke request.route_path.
So... is there anyway for me to make my life easier?

It might be useful for you to use a hybrid approach to get the site loaded.
def groupfinder(userid, request):
user = request.db.query(User).filter_by(id=userid).first()
if user is not None:
# somehow get the list of sites they are members
sites = user.allowed_sites
return ['site:%d' % s.id for s in sites]
class SiteFactory(object):
def __init__(self, request):
self.request = request
def __getitem__(self, key):
site = self.request.db.query(Site).filter_by(id=key).first()
if site is None:
raise KeyError
site.__parent__ = self
site.__name__ = key
site.__acl__ = [
(Allow, 'site:%d' % site.id, 'view'),
]
return site
We'll use the groupfinder to map users to principals. We've chosen here to only map them to the sites which they have a membership within. Our simple traversal only requires a root object. It updates the loaded site with an __acl__ that uses the same principals the groupfinder is setup to create.
You'll need to setup the request.db given patterns in the Pyramid Cookbook.
def site_pregenerator(request, elements, kw):
# request.route_url(route_name, *elements, **kw)
from pyramid.traversal import find_interface
# we use find_interface in case we improve our hybrid traversal process
# to take us deeper into the hierarchy, where Site might be context.__parent__
site = find_interface(request.context, Site)
if site is not None:
kw['site_id'] = site.id
return elements, kw
Pregenerator can find the site_id and add it to URLs for you automatically.
def add_site_route(config, name, pattern, **kw):
kw['traverse'] = '/{site_id}'
kw['factory'] = SiteFactory
kw['pregenerator'] = site_pregenerator
if pattern.startswith('/'):
pattern = pattern[1:]
config.add_route(name, '/site/{site_id}/' + pattern, **kw)
def main(global_conf, **settings):
config = Configurator(settings=settings)
authn_policy = AuthTktAuthenticationPolicy('seekrit', callback=groupfinder)
config.set_authentication_policy(authn_policy)
config.set_authorization_policy(ACLAuthorizationPolicy())
config.add_directive(add_site_route, 'add_site_route')
config.include(site_routes)
config.scan()
return config.make_wsgi_app()
def site_routes(config):
config.add_site_route('site_users', '/user')
config.add_site_route('site_items', '/items')
We setup our application here. We also moved the routes into an includable function which can allow us to more easily test the routes.
#view_config(route_name='site_users', permission='view')
def users_view(request):
site = request.context
Our views are then simplified. They are only invoked if the user has permission to access the site, and the site object is already loaded for us.
Hybrid Traversal
A custom directive add_site_route is added to enhance your config object with a wrapper around add_route which will automatically add traversal support to the route. When that route is matched, it will take the {site_id} placeholder from the route pattern and use that as your traversal path (/{site_id} is path we define based on how our traversal tree is structured).
Traversal happens on the path /{site_id} where the first step is finding the root of the tree (/). The route is setup to perform traversal using the SiteFactory as the root of the traversal path. This class is instantiated as the root, and the __getitem__ is invoked with the key which is the next segment in the path ({site_id}). We then find a site object matching that key and load it if possible. The site object is then updated with a __parent__ and __name__ to allow the find_interface to work. It is also enhanced with an __acl__ providing permissions mentioned later.
Pregenerator
Each route is updated with a pregenerator that attempts to find the instance of Site in the traversal hierarchy for a request. This could fail if the current request did not resolve to a site-based URL. The pregenerator then updates the keywords sent to route_url with the site id.
Authentication
The example shows how you can have an authentication policy which maps a user into principals indicating that this user is in the "site:" group. The site (request.context) is then updated to have an ACL saying that if site.id == 1 someone in the "site:1" group should have the "view" permission. The users_view is then updated to require the "view" permission. This will raise an HTTPForbidden exception if the user is denied access to the view. You can write an exception view to conditionally translate this into a 404 if you want.
The purpose of my answer is just to show how a hybrid approach can make your views a little nicer by handling common parts of a URL in the background. HTH.

For the views, you could use a class so that common jobs can be carried out in the __init__ method (docs):
from pyramid.view import view_config
class SiteView(object):
def __init__(self, request):
self.request = request
self.id = self.request.matchdict['id']
# Do any common jobs here
#view_config(route_name='site_overview')
def site_overview(self):
# ...
#view_config(route_name='site_users')
def site_users(self):
# ...
def route_site_url(self, name, **kw):
return self.request.route_url(name, id=self.id, **kw)
And you could use a route prefix to handle the URLs (docs). Not sure whether this would be helpful for your situation or not.
from pyramid.config import Configurator
def site_include(config):
config.add_route('site_overview', '')
config.add_route('site_users', '/user')
config.add_route('site_items', '/item')
# ...
def main(global_config, **settings):
config = Configurator()
config.include(site_include, route_prefix='/site/{id}')

What is the minimum amount of boilerplate for negotiating content in Django?

In a Django application, I have more than a handful of views which return JSON, with an invocation similar to:
return HttpResponse(json.dumps(content), mimetype="application/json")
I want to start creating views that return either HTML or JSON depending on the Accept headers from the request. Possibly other types, too, but those are the main ones. I also want to get multiple URLs routed to this view; the file extensions ".html" and ".json" help tell clients which types they should Accept when making their request, and I want to avoid the "?format=json" antipattern.
What's the correct, blessed way to do this in Django with a minimum of boilerplate or repeated code?
(Edit: Rephrase in order to better follow SO's community guidelines.)

I think a class-based view mixin (django 1.3+) is the easiest way to do this. All your views would inherit from a base class that contains logic to respond with the appropriate content.

I think I may not be seeing your big picture here but this is what I would do:
Have a html template that you render when html is requested and keep your json.dumps(content) for when json is requested. Seems to be obvious but I thought i should mention it anyway.
Set your URLs to send you "json" or 'html'. :
(r'^some/path/(?P<url_path>.*)\.(?P<extension>html|json)$', 'some.redirect.view'),
(r'^/(?P<extension>html|json)/AppName', include(MyApp)),
# etc etc
and your view:
def myRedirectView(request, url_path, extension):
view, args, kwargs = resolve("/" + extension + "/" + urlPath)
kwargs['request'] = request
return view(*args, **kwargs)
I know this is a bit vague because I haven't fully thought it through but its where I would start.

I have addressed this by creating a generic view class, based on Django's own generic.View class, that defines a decorator 'accept_types'. This modifies the view to which it is applied so that it returns None if the indicated content-type is not in the Accept header. Then, the get() method (which is called by the generic.View dispatcher) looks like this:
def get(self, request):
self.request = request # For clarity: generic.View does this anyway
resultdata = { 'result': data, etc. }
return (
self.render_uri_list(resultdata) or
self.render_html(resultdata) or
self.error(self.error406values())
)
The actual view renderers are decorated thus:
#ContentNegotiationView.accept_types(["text/uri-list"])
def render_uri_list(self, resultdata):
resp = HttpResponse(status=200, content_type="text/uri-list")
# use resp.write(...) to assemble rendered response body
return resp
#ContentNegotiationView.accept_types(["text/html", "application/html", "default_type"])
def render_html(self, resultdata):
template = loader.get_template('rovserver_home.html')
context = RequestContext(self.request, resultdata)
return HttpResponse(template.render(context))
The (one-off) generic view class that declares the decorator looks like this:
class ContentNegotiationView(generic.View):
"""
Generic view class with content negotiation decorators and generic error value methods
Note: generic.View dispatcher assigns HTTPRequest object to self.request.
"""
#staticmethod
def accept_types(types):
"""
Decorator to use associated function to render the indicated content types
"""
def decorator(func):
def guard(self, values):
accept_header = self.request.META.get('HTTP_ACCEPT',"default_type")
accept_types = [ a.split(';')[0].strip().lower()
for a in accept_header.split(',') ]
for t in types:
if t in accept_types:
return func(self, values)
return None
return guard
return decorator
(The parameter handling in the decorator should be generalized - this code works, but is still in development as I write this. The actual code is in GitHub at https://github.com/wf4ever/ro-manager/tree/develop/src/roverlay/rovweb/rovserver, but in due course should be separated to a separate package. HTH.)

What's the best way to disable Jinja2 template caching in bottle.py?

I'm using Jinja2 templates with Bottle.py and Google App Engine's dev_appserver for development. I want the templates to automatically reload on every request (or ideally only when they change), so that I don't have to keep restarting the server.
According to bottle's docs, you're supposed to be able to disable template caching by calling bottle.debug(True).
Jinja still seems to be caching its templates, though. I believe this to be because of the way the bottle jinja2 adapter is written (it just uses a default Jinja2 loader and doesn't expose many configuration options).
Following the Jinja2 Docs, I wrote this custom Loader that I would expect to trigger a template reload every time, but it doesn't seem to work either:
import settings
from bottle import jinja2_template
from bottle import Jinja2Template, template as base_template
class AutoreloadJinja2Template(Jinja2Template):
def loader(self, name):
def uptodate():
# Always reload the template if we're in DEVMODE (a boolean flag)
return not settings.DEVMODE
fname = self.search(name, self.lookup)
if fname:
with open(fname, "rb") as f:
source = f.read().decode(self.encoding)
return (source, fname, uptodate)
template = functools.partial(base_template,
template_adapter=AutoreloadJinja2Template,
template_lookup = settings.TEMPLATE_PATHS,
template_settings={
'auto_reload': settings.DEVMODE
}
)
Templates are still getting cached until I restart dev_appserver. This must be a fairly common problem. Does anyone have a solution that works?
UPDATE:
I ended up doing something like:
class CustomJinja2Template(Jinja2Template):
if settings.DEVMODE:
def prepare(self, *args, **kwargs):
kwargs.update({'cache_size':0})
return Jinja2Template.prepare(self, *args, **kwargs)
template = functools.partial(original_template, template_adapter=CustomJinja2Template)
This causes the templates to always reload, but only if a python module has been touched. i.e. if you just edit a template file, the changes won't take affect until you edit one of the python files that imports it. It seems as if the templates are still being cached somewhere.

I resolved this issue by ditching bottle's template solutions completely and using pure jinja2. It seems that Jijnja's FileSystemLoader is the only one, which can watch for file changes.
I defined new template function as follows (it looks for files in views/, just like bottle used to):
from jinja2 import Environment, FileSystemLoader
if local_settings.DEBUG:
jinja2_env = Environment(loader=FileSystemLoader('views/'), cache_size=0)
else:
jinja2_env = Environment(loader=FileSystemLoader('views/'))
def template(name, ctx):
t = jinja2_env.get_template(name)
return t.render(**ctx)
Then I use it like this:
#route('/hello')
def hello():
return template('index.tpl', {'text': "hello"})
The difference from bottle's API is that you have to include .tpl in file name and you have to pass context variables as dictionary.

Bottle caches templates internally (independent from Jinja2 caching). You can disable the cache via bottle.debug(True) or bottle.run(..., debug=True) or clear the cache with bottle.TEMPLATES.clear().

The Environment object in jinja2 has a configuration value for the cache size and, according to the documentation,
If the cache size is set to 0 templates are recompiled all the time
Have you tried something like this?
from jinja2 import Environment
env = Environment(cache_size=0)

Using bottle view decorator, you can just do #view('your_view', cache_size=0).
Bottle has a reloader=True parameter in server adapter, but I guess it works only with SimpleTemplate. I will try to extend this behaviour to other template engines.
If you want to do it in all your views, maybe you can do something like this:
import functools
view = functools.partials(view, cache_size=0)
This way, you can do it only when you are in debug mode adding an if statement to this code if bottle.DEBUG.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Cache Django Templates in multi-tenant site - python

Related

Wagtail: How to override default ImageEmbedHandler?

reusable apps and access control

How to handle complex URL in a elegant way?

What is the minimum amount of boilerplate for negotiating content in Django?

What's the best way to disable Jinja2 template caching in bottle.py?

Categories

Resources