Should I be worried about Django template inefficiencies? - python

I'm relatively new to Django, and I'm using version 1.5 to build a REST api. Calls to the api expect JSON to be returned (I'm using this with an Ember.js front-end).
I'm wondering if I can't do something like this:
def listproject(request, pk_id):
# list single project at /projects/<pk_id>
project = Project.objects.get(pk = pk_id)
snapshots = Snapshot.objects.filter(project = project)
# (both are same up to here)
return render_to_response('project.json',
{"project":project, "snapshots":snapshots},
mimetype="text/json")
Where project.json is this Django template:
{
"id": "{{ project.pk }}",
"name": "{{ project.name }}",
"snapshot_ids": [ {% for snapshot in snapshots %}"{{ snapshot.pk }}"{% if not forloop.last %}, {% endif %}{% endfor %}
}
Someone who has worked with Django much longer than I have is suggesting that using templates for this will be inefficient. He suggests I do the following instead:
def listproject(request, pk_id):
# list single project at /projects/<pk_id>
project = Project.objects.get(pk = pk_id)
snapshots = Snapshot.objects.filter(project = project)
# (both are same up to here)
ret_json = []
ret_json.append('{"id": "' + str(project.pk) + '", ')
ret_json.append('"name": "' + project.name + '", "snapshot_ids": [')
snapshot_json = []
for snapshot in snapshots:
snapshot_json.append('"' + str(snapshot.pk) + '",')
ret_json.append(''.join(snapshot_json)[0:-1] + ']}')
return HttpResponse(content=''.join(ret_json), mimetype="text/json")
I've tested both. They work identically, though my version produces more readable JSON.
Please help us end our debate! Which is more efficient (and why)?

It's true that Django templates are not particularly efficient. However, that's only really a problem when you have very large templates that themselves extend or include many other templates, for example in a complex content management system. With a single template containing a small number of fields like you have, template rendering is insignificant compared to the overall overhead of serving the request.
That said I'm a bit confused about both of your alternatives. Why aren't you generating JSON via the standard json library? That's the proper way to do it, not by building up strings either in templates or in Python code.
ret = {'id': project.id,
'name': project.name,
'snapshot_ids': [snapshot.id for snapshot in snapshots]}
ret_json = json.dumps(ret)

Both of these options look horrible to me. I'd prefer to avoid 'hand-writing' the JSON as much as possible and just convert directly from Python data structures.
Fortunately the json module is designed for this.
import json
def listproject(request, pk_id):
# list single project at /projects/<pk_id>
project = Project.objects.get(pk=pk_id)
snapshots = Snapshot.objects.filter(project=project)
data = {
"id": project.pk,
"name": project.name,
"snapshot_ids": [snapshot.pk for snapshot in snapshots],
}
return HttpResponse(content=json.dumps(data), mimetype="text/json")
Reasons to avoid 'hand-writing' the code are obvious - avoid bugs from typos, code is shorter and simpler, json module is likely to be faster.
If you are concerned about the 'readability' of the generated JSON the json module provides some options for controlling the output (indents etc):
http://docs.python.org/2/library/json.html

I usually use this little function:
import json
from django.http import HttpResponse
def json_response(ob):
return HttpResponse(
json.dumps(ob), mimetype="application/json")
So then you can just return the result of that from a view:
def listproject(request, pk_id):
project = Project.objects.get(pk=pk_id) # Use get_object_or_404 ?
snapshots = Snapshot.objects.filter(project=project)
return json_response({
"id": project.pk,
"name": project.name,
"snapshot_ids": [snapshot.pk for snapshot in snapshots],
})

Related

Is there a way to check if a path is absolute in jinja2?

In the pydata-sphinx-theme we need to check if a path is absolute or not before adding it to the template. Currently we use the following:
{% set image_light = image_light if image_light.startswith("http") else pathto('_static/' + image_light, 1) %}
It's working but fails to capture local files and many other absolute configurations. Is there a more elegant way to perform this check ?
I would consider implementing this logic in Python proper, and bundle it as a custom template function. This way it'd be much easier to implement, debug and test.
thanks #klas Š.for the guidances.
for anyone coming here I did add:
from urllib.parse import urlparse
# The registration function
def setup_is_absolute(app, pagename, templatename, context, doctree):
def is_absolute(link):
return bool(urlparse(link).netloc) or link.startswith("/")
context['is_absolute'] = is_absolute
# Your extension's setup function
def setup(app):
app.connect("html-page-context", setup_is_absolute)
and in my template:
{{ is_absolute(logo) }}

How to speed up returning a 20MB Json file from a Python-Flask application?

I am trying to call an API which in turn triggers a store procedure from our sqlserver database. This is how I coded it.
class Api_Name(Resource):
def __init__(self):
pass
#classmethod
def get(self):
try:
engine = database_engine
connection = engine.connect()
sql = "DECLARE #return_value int EXEC #return_value = [dbname].[dbo].[proc_name])
return call_proc(sql, apiname, starttime, connection)
except Exception as e:
return {'message': 'Proc execution failed with error => {error}'.format(error=e)}, 400
pass
call_proc is the method where I return the JSON from database.
def call_proc(sql: str, connection):
try:
json_data = []
rv = connection.execute(sql)
for result in rv:
json_data.append(dict(zip(result.keys(), result)))
return Response(json.dumps(json_data), status=200)
except Exception as e:
return {'message': '{error}'.format(error=e)}, 400
finally:
connection.close()
The problem with the output is the way JSON is returned and the size of it.
At first the API used to take 1minute 30seconds: when the return statement was like this:
case1: return Response(json.dumps(json_data), status=200, mimetype='application/json')
After looking online, I found that the above statement is trying to prettify JSON. So I removed mimetype from the response & made it as
case2: return Response(json.dumps(json_data), status=200)
The API runs for 30seconds, although the JSON output is not aligned properly but its still JSON.
I see the output size of the JSON returned from the API is close 20MB. I observed this on postman response:
Status: 200 OK Time: 29s Size: 19MB
The difference in Json output:
case1:
[ {
"col1":"val1",
"col2":"val2"
},
{
"col1":"val1",
"col2":"val2"
}
]
case2:
[{"col1":"val1","col2":"val2"},{"col1":"val1","col2":"val2"}]
Will the difference in output from the two aforementioned cases are different ? If so, how can I fix the problem ?
If there is no difference, is there any way I speed up this further and reduce the run time further more, like compressing the JSON which I am returning ?
You can use gzip compression to make your plain text weight from Megabytes to even Kilobytes. Or even use flask-compress library for that.
Also I'd suggest to use ujson to make dump() call faster.
import gzip
from flask import make_response
import ujson as json
#app.route('/data.json')
def compress():
compression_level = 5 # of 9 max
data = [
{"col1": "val1", "col2": "val2"},
{"col1": "val1", "col2": "val2"}
]
content = gzip.compress(json.dumps(data).encode('utf8'), compression_level)
response = make_response(content)
response.headers['Content-length'] = len(content)
response.headers['Content-Encoding'] = 'gzip'
return response
Documentation:
https://docs.python.org/3/library/gzip.html
https://github.com/colour-science/flask-compress
https://pypi.org/project/ujson/
First of all, profile: if 90% the time is being spent transferring across the network then optimising processing speed is less useful than optimising transfer speed (for example, by compressing the response as wowkin recommended (though the web server may be configured to do this automatically, if you are using one)
Assuming that constructing the JSON is slow, if you control the database code you could use its JSON capabilities to serialise the data, and avoid doing it at the Python layer. For example,
SELECT col1, col2
FROM tbl
WHERE col3 > 42
FOR JSON AUTO
would give you
[
{
"col1": "foo",
"col2": 1
},
{
"col1": "bar",
"col2": 2
},
...
]
Nested structures can be created too, described in the docs.
If the requester only needs the data, return it as a download using flask's send_file feature and avoid the cost of constructing an HTML response:
from io import BytesIO
from flask import send_file
def call_proc(sql: str, connection):
try:
rv = connection.execute(sql)
json_data = rv.fetchone()[0]
# BytesIO expects encoded data; if you can get the server to encode
# the data instead it may be faster.
encoded_json = json_data.encode('utf-8')
buf = BytesIO(encoded_json)
return send_file(buf, mimetype='application/json', as_attachment=True, conditional=True)
except Exception as e:
return {'message': '{error}'.format(error=e)}, 400
finally:
connection.close()
You need to implement pagination on your API. 19MB is absurdly large and will lead to some very annoyed users.
gzip and clevererness with the JSON responses will sadly not be enough, you'll need to put in a bit more legwork.
Luckily, there's many pagination questions and answers, and Flasks modular approach to things will mean that someone probably wrote up a module that's applicable to your problem. I'd start off by re-implementing the method with an ORM. I heard that sqlalchemy is quite good.
To answer your question:
1 - Both JSON are semantically identical.
You can make use of http://www.jsondiff.com to compare two JSON.
2 - I would recommend you to make chunks of your data and send it across network.
This might help:
https://masnun.com/2016/09/18/python-using-the-requests-module-to-download-large-files-efficiently.html
TL;DR; Try restructuring your JSON payload (i.e. change schema)
I see that you are constructing the JSON response in one of your APIs. Currently, your JSON payload looks something like:
[
{
"col0": "val00",
"col1": "val01"
},
{
"col0": "val10",
"col1": "val11"
}
...
]
I suggest you restructure it in such a way that each (first level) key in your JSON represents the entire column. So, for the above case, it will become something like:
{
"col0": ["val00", "val10", "val20", ...],
"col1": ["val01", "val11", "val21", ...]
}
Here are the results from some offline test I performed.
Experiment variables:
NUMBER_OF_COLUMNS = 10
NUMBER_OF_ROWS = 100000
LENGTH_OF_STR_DATA = 5
#!/usr/bin/env python3
import json
NUMBER_OF_COLUMNS = 10
NUMBER_OF_ROWS = 100000
LENGTH_OF_STR_DATA = 5
def get_column_name(id_):
return 'col%d' % id_
def random_data():
import string
import random
return ''.join(random.choices(string.ascii_letters, k=LENGTH_OF_STR_DATA))
def get_row():
return {
get_column_name(i): random_data()
for i in range(NUMBER_OF_COLUMNS)
}
# data1 has same schema as your JSON
data1 = [
get_row() for _ in range(NUMBER_OF_ROWS)
]
with open("/var/tmp/1.json", "w") as f:
json.dump(data1, f)
def get_column():
return [random_data() for _ in range(NUMBER_OF_ROWS)]
# data2 has the new proposed schema, to help you reduce the size
data2 = {
get_column_name(i): get_column()
for i in range(NUMBER_OF_COLUMNS)
}
with open("/var/tmp/2.json", "w") as f:
json.dump(data2, f)
Comparing sizes of the two JSONs:
$ du -h /var/tmp/1.json
17M
$ du -h /var/tmp/2.json
8.6M
In this case, it almost got reduced by half.
I would suggest you do the following:
First and foremost, profile your code to see the real culprit. If it is really the payload size, proceed further.
Try to change your JSON's schema (as suggested above)
Compress your payload before sending (either from your Flask WSGI app layer or your webserver level - if you are running your Flask app behind some production grade webserver like Apache or Nginx)
For large data that you can't paginate using something like ndjson (or any type of delimited record format) can really reduce the server resources needed since you'd be preventing holding the JSON object in memory. You would need to get access to the response stream to write each object/line to the response though.
The response
[ {
"col1":"val1",
"col2":"val2"
},
{
"col1":"val1",
"col2":"val2"
}
]
Would end up looking like
{"col1":"val1","col2":"val2"}
{"col1":"val1","col2":"val2"}
This also has advantages on the client since you can parse and process each line on it's own as well.
If you aren't dealing with nested data structures responding with a CSV is going to be even smaller.
I want to note that there is a standard way to write a sequence of separate records in JSON, and it's described in RFC 7464. For each record:
Write the record separator byte (0x1E).
Write the JSON record, which is a regular JSON document that can also contain inner line breaks, in UTF-8.
Write the line feed byte (0x0A).
(Note that the JSON text sequence format, as it's called, uses a more liberal syntax for parsing text sequences of this kind; see the RFC for details.)
In your example, the JSON text sequence would look as follows, where \x1E and \x0A are the record separator and line feed bytes, respectively:
\x1E{"col1":"val1","col2":"val2"}\x0A\x1E{"col1":"val1","col2":"val2"}\x0A
Since the JSON text sequence format allows inner line breaks, you can write each JSON record as you naturally would, as in the following example:
\x1E{
"col1":"val1",
"col2":"val2"}
\x0A\x1E{
"col1":"val1",
"col2":"val2"
}\x0A
Notice that the media type for JSON text sequences is not application/json, but application/json-seq; see the RFC.

Python 2.7 reading template and returning new file with substitutions

I am currently loading my data into a variable (seen below as 'data') and then reading my template file and replacing %s with variables contained in 'data'. Here is my page reading, substitution, writing then displaying the new page on local server code:
def main
contents = makePage('varibletest.html', (data['Address'], data['Admin'], data['City'], data['ContractNo'], data['DealStatus'], data['Dealer'], data['Finance'], data['FinanceNumber'], data['First'], data['Last'], data['Message'], data['Notes'], data['Result'], data['SoldDate'], data['State'], data['Zip'])) # process input into a page
browseLocal(contents, 'Z:/xampp/htdocs/', 'SmartFormTest{}.php'.format((data['ContractNo']))) # display page
def fileToStr(fileName):
"""Return a string containing the contents of the named file."""
fin = open(fileName);
contents = fin.read();
fin.close()
return contents
def makePage(templateFileName, substitutions):
"""Returns a string with substitutions into a format string taken
from the named file. The single parameter substitutions must be in
a format usable in the format operation: a single data item, a
dictionary, or an explicit tuple."""
pageTemplate = fileToStr(templateFileName)
return pageTemplate % substitutions
def strToFile(text, savefile):
"""Write a file with the given name and the given text."""
output = file(savefile,"w")
output.write(text)
output.close()
def browseLocal(webpageText, path, filename):
"""Start your webbrowser on a local file containing the text."""
savefile = path + filename
strToFile(webpageText, savefile)
import webbrowser
b = webbrowser
b.open('192.168.1.254:1337/' + filename)
main()
Here is my template file (included is some silliness to demonstrate I have tried quite a few things to get this working):
%s
%s
%s
%s
%s
%s
%s.format(Address)
%s.format(data['Address'])
%s[2]
%s(2)
%s{2]
%s
%s
%s
%s
%s
When the new page is opened the variables are all there in sequential order. I need the ability to insert, say, address in multiple places.
Thanks in advance for your help!
EDIT --
Here's my new code with solution:
def main()
fin = open('DotFormatTemplate.html')
contents = fin.read();
output = contents.format(**data)
print output
main()
Template file:
I live at
Address: {Address}
Hope this makes someones life easier as it did mine!
Simple templates using string.format
Typical way of rendering simple template by means of string.format method:
data = {"Address": "Home sweet home", "Admin": "James Bond", "City": "London"}
template = """
I live at
Address: {Address}
in City of: {City}
and my admin is: {Admin}
"""
print template.format(**data)
what prints:
I live at
Address: Home sweet home
in City of: London
and my admin is: James Bond
The **data is needed to pass all data keywords and related values to the function.
Using Jinja2 for loopy templates
string.format is great in that it is part of Python standard library. However, as soon as you come to more complex data structure including lists and other iterables, string.format comes short or requires building output part by part, what makes soon your template broken to too many parts.
There are many other templating libraries, jinja2 being my favourite one:
$ pip install jinja2
Then we can play this way:
>>> from jinja2 import Template
>>> jdata = {'Name': 'Jan', 'Hobbies': ['Python', 'collecting principles', 'DATEX II']}
>>> templstr = """
... My name is {{ Name }} and my Hobbies are:
...
... {% for hobby in Hobbies %}
... - {{ hobby }}
... {% endfor %}
... """
>>> templ = Template(templstr)
>>> print templ.render(jdata)
My name is Jan and my Hobbies are:
- Python
- collecting principles
- DATEX II
With jinja2 there is no need to call templ.render(**jdata), but such call would also work.
Conclusions
Samples above shall give you initial idea, what can be done with templates and how they can be used.
In both cases there are many more functionalities provided by given solutions, just read doc and enjoy it.

Display all jinja object attributes

Is there a way to display the name/content/functions of all attributes of a given object in a jinja template. This would make it easier to debug a template that is not acting as expected.
I am building a website using the hyde framework and this would come in quite handy since I am still learning the intricacies of both jinja and hyde.
Originally, I had thought it would work to use the attr filter, but this seems to require a name value. I would like to to not have to specify the name in order to get all available attributes for the object.
Some google searching showed django syntax looks like the following, but I am not familiar with django so this may only apply to database items. Long story short, I would like a method that works kind of like this for any object named obj
{% for field, value in obj.get_fields %}
{{ field }} : {{ value }} </br>
{% endfor %}
final solution:
#jayven was right, I could create my own jinja2 filter. Unfortunately, using the stable version of hyde (0.8.4), this is not a trivial act of having a filter in the pythonpath and setting a simple yaml value in the site.yaml file (There is a pull-request for that). That being said, I was able to figure it out! So the following is my final solution which ends up being very helpful for debugging any unkown attributes.
It's easy enough to create site-specific hyde extensions just create a local python package with the following directory tree
hyde_ext
__init__.py
custom_filters.py
Now create the extension:
from hyde.plugin import Plugin
from jinja2 import environmentfilter, Environment
debug_attr_fmt = '''name: %s
type: %r
value: %r'''
#environmentfilter
def debug_attr(env, value, verbose=False):
'''
A jinja2 filter that creates a <pre> block
that lists all the attributes of a given object
inlcuding the value of those attributes and type.
This filter takes an optional variable "verbose",
which prints underscore attributes if set to True.
Verbose printing is off by default.
'''
begin = "<pre class='debug'>\n"
end = "\n</pre>"
result = ["{% filter escape %}"]
for attr_name in dir(value):
if not verbose and attr_name[0] == "_":
continue
a = getattr(value, attr_name)
result.append(debug_attr_fmt % (attr_name, type(a), a))
result.append("{% endfilter %} ")
tmpl = Environment().from_string("\n\n".join(result))
return begin + tmpl.render() + end
#return "\n\n".join(result)
# list of custom-filters for jinja2
filters = {
'debug_attr' : debug_attr
}
class CustomFilterPlugin(Plugin):
'''
The curstom-filter plugin allows any
filters added to the "filters" dictionary
to be added to hyde
'''
def __init__(self, site):
super(CustomFilterPlugin, self).__init__(site)
def template_loaded(self,template):
super(CustomFilterPlugin, self).template_loaded(template)
self.template.env.filters.update(filters)
To let hyde know about the extension add hyde_ext.custom_filters.CustomFilterPlugin to the "plugins" list of the site.yaml file.
Lastly, test it out on a file, you can add this to some random page {{resource|debug_attr}} or the following to get even the underscore-attributes {{resource|debug_attr(verbose=True)}}
Of course, I should add, that it seems like this might become much easier in the future whenever hyde 1.0 is released. Especially since there is already a pull request waiting to implement a simpler solution. This was a great way to learn a little more about how to use jinja and hyde though!
I think you can implement a filter yourself, for example:
from jinja2 import *
def show_all_attrs(value):
res = []
for k in dir(value):
res.append('%r %r\n' % (k, getattr(value, k)))
return '\n'.join(res)
env = Environment()
env.filters['show_all_attrs'] = show_all_attrs
# using the filter
tmpl = env.from_string('''{{v|show_all_attrs}}''')
class Myobj(object):
a = 1
b = 2
print tmpl.render(v=Myobj())
Also see the doc for details: http://jinja.pocoo.org/docs/api/#custom-filters

Django: Detect unused templates

Is there a way to detect unused templates in a Django project?
Before Django 1.3, that would have been possible with a simple string-matching function like this one. But since 1.3, there are generic class based views that automatically generate a template_name, if you don't override it (e.g. DetailView).
Also, if you override 3rd party module templates, those templates aren't used anywhere directly in your views.
Maybe it could be done by crawling all URL definitions, loading the corresponding views and getting the template_name from them?
I was curious if you could do this by monkey patching/decorating get_template instead. I think you can, though you have to find all the template loading
functions (I have two in my example below).
I used wrapt when I noticed it went beyond just loader.get_template, but it seems to the trick just fine. Of course, keep this 50000 km away from prod, but...
Now, the thing to follow as well is that I am driving this with unittests and nosetests so, if you have full branch coverage of your template-using Python code, you should be able to get most templates (assuming I didn't miss any get_template-type functions).
in settings.py
This is the "brains" to patch get_template & co.
import wrapt
import django.template.loader
import django.template.engine
def wrapper(wrapped, instance, args, kwargs):
#concatenate the args vector into a string.
# print "\n\n\n\n%s\nI am a wrapper \nusage:%s\n%s\n\n\n\n\n" % ("*"*80, usage, "*"*80)
try:
return wrapped(*args, **kwargs)
finally:
usage = ",".join([unicode(arg) for arg in args if arg])
track_usage(usage)
#you have to wrap whatever is loading templates...
#imported django module + class/method/function path of what needs to be
#wrapped within that module. comment those 2 lines out and you are back to
#normal
wrapt.wrap_function_wrapper(django.template.loader, 'get_template', wrapper)
wrapt.wrap_function_wrapper(django.template.engine, 'Engine.find_template', wrapper)
See safely-applying-monkey-patches-in-python for more details on wrapt. Actually easier to use than to understand the docs, decorators make my brain hurt.
Also, to track which django functions were doing the actual loads I mispelled some template names on purpose in the code and in templates, ran unit tests on it and looked at the stacktraces for missing template exceptions.
This is my rather badly-written function which adds to a set and puts it into
a json output....
def track_usage(usage):
fnp_usage = "./usage.json"
try:
with open(fnp_usage, "r") as fi:
data = fi.read()
#read the set of used templates from the json file
j_data = json.loads(data)
s_used_file = set(j_data.get("li_used"))
except (IOError,),e:
s_used_file = set()
j_data = dict()
s_used_file.add(usage)
#convert the set back to a list for json compatibility
j_data["li_used"] = list(s_used_file)
with open(fnp_usage, "w") as fo:
json.dump(j_data, fo)
and the ouput (with a script to format it):
import sys
import json
fnp_usage = sys.argv[1]
with open(fnp_usage, "r") as fi:
data = fi.read()
#read the set of used templates from the json file
j_data = json.loads(data)
li_used_file = j_data.get("li_used")
li_used_file.sort()
print "\n\nused templates:"
for t in li_used_file:
print(t)
From wrapping the 2 functions above, it seems to have caught extends, %includes and straight get_templates, as well as list-type templates that were being used by class-based views. It even caught my dynamically generated templates which aren't even on the file system but get loaded with a custom loader.
used templates:
bootstrap/display_form.html
bootstrap/errors.html
bootstrap/field.html
bootstrap/layout/baseinput.html
bootstrap/layout/checkboxselectmultiple.html
bootstrap/layout/field_errors.html
bootstrap/layout/field_errors_block.html
bootstrap/layout/help_text.html
bootstrap/layout/help_text_and_errors.html
bootstrap/layout/radioselect.html
bootstrap/whole_uni_form.html
django_tables2/table.html
dynamic_template:db:testdb:name:pssecurity/directive.PrimaryDetails.json
uni_form/layout/div.html
uni_form/layout/fieldset.html
websec/__base.html
websec/__full12.html
websec/__l_right_sidebar.html
websec/bootstrapped_home.html
websec/changedb.html
websec/login.html
websec/requirejs_config.html
websec/topnav.html
websec/user_msg.html
It's not possible to detect unused templates for certain, even in the absence of generic views, because you can always write code like this:
get_template(any_code_you_like()).render(context)
So even prior to Django 1.3 the django-unused-templates application you linked to could only have worked for projects that respected some kind of discipline about the use of templates. (For example, always having a string literal as the template argument to functions like get_template and render_to_response.)
Loading all the views wouldn't be sufficient either: a view may use different templates under different circumstances:
def my_view(request):
if request.user.is_authenticated():
return render(request, 'template1.html')
else:
return render(request, 'template2.html')
And of course templates may not be used by views at all, but by other parts of the system (for example, e-mail messages).

Categories