Django custom for complex Func (sql function) - python

In the process of finding a solution for Django ORM order by exact, I created a custom django Func:
from django.db.models import Func
class Position(Func):
function = 'POSITION'
template = "%(function)s(LOWER('%(substring)s') in LOWER(%(expressions)s))"
template_sqlite = "instr(lower(%(expressions)s), lower('%(substring)s'))"
def __init__(self, expression, substring):
super(Position, self).__init__(expression, substring=substring)
def as_sqlite(self, compiler, connection):
return self.as_sql(compiler, connection, template=self.template_sqlite)
which works as follows:
class A(models.Model):
title = models.CharField(max_length=30)
data = ['Port 2', 'port 1', 'A port', 'Bport', 'Endport']
for title in data:
A.objects.create(title=title)
search = 'port'
qs = A.objects.filter(
title__icontains=search
).annotate(
pos=Position('title', search)
).order_by('pos').values_list('title', flat=True)
# result is
# ['Port 2', 'port 1', 'Bport', 'A port', 'Endport']
But as #hynekcer commented:
"It crashes easily by ') in '') from myapp_suburb; drop ...
expected that the name of the app is "myapp and autocommit is enabled."
The main problem is that extra data (substring) got into the template without sqlescape which leaves the app vulnerable to SQL injection attacks.
I cannot find which is the Django way to protect from that.
I created a repo (djposfunc) where you can test any solution.

TL;DR:
All examples with Func() in Django docs can be easily used to safely implement other similar SQL functions with one argument.
All builtin Django database fuctions and conditional functions that are descendants of Func() are also safe by design. Application beyond this limit needs comment.
The class Func() is the most general part of Django Query expressions. It allows to implement almost any function or operator into Django ORM some way. It is like a Swiss Army knife, very universal, but one must be little more attentive to not cut himself, than with a specialized tool (like an electric cutter with optical barrier). It is still much more secure then to forge an own tool by hammer from piece of iron, if once an "upgraded" "secure" pocket knife does not fit into pocket.
Security notes
The short documentation for Func(*expressions, **extra) with examples should be read first. (I recommend here the development docs for Django 2.0 where is recently added more security information, including Avoiding SQL injection, related exactly to your example.)
All positional arguments in *expressions are compiled by Django, that is Value(string) are moved to parameters, where they are correctly escaped by database driver.
Other strings are interpreted as field names F(name), then prefixed by right table_name. alias dot, eventually a join to that table is added and names are treated by quote_name() function.
The problem is that the documentation in 1.11 is still simple, the seductive parameters **extra and **extra_context are documented vaguely. They can be used only for simple parameters that will be never "compiled" and never go through SQL params. Numbers or simple strings with safe characters without apostrophe, backslash or percent are good. It can not be a field name, because it will be not unambiguous, neither joined. It is safe for previously checked numbers and fixed strings like 'ASC'/'DESC', timezone names and other values like from a drop down list. There is still a weak point. Drop down list values must be checked at the server side. Also numbers must be verified that they are numbers, not a numeric string like '2' because all database functions silently accept an omitted numeric string instead of number. If a false "number" is passed '0) from my_app.my_table; rogue_sql; --' then the injection is over. Note that the rogue string doesn't contain any very prohibitive character in this case. User supplied numbers must be checked specifically or the value must be passed through positional expressions.
It is safe to specify function name and arg_joiner string attributes of Func class or the same function and arg_joiner parameters of Func() call. The template parameter should never contain apostrophes around substituted parameter expressions inside parentheses: ( %(expressions)s ), because apostrophes are added by the database driver if necessary, but additional apostrophes can cause that it usually doesn't work correctly, but sometimes it could be overlooked and that would lead to another security issue.
Notes not related to security
Many simple builtin functions with one argument do not look as simple as possible because they are derived from multi-purpose descendants of Func. For example Length is a function that can be used also as lookup Transform.
class Length(Transform):
"""Return the number of characters in the expression."""
function = 'LENGTH'
output_field = fields.IntegerField() # sometimes specified the type
# lookup_name = 'length' # useful for lookup not for Func usage
Lookup transformation applies the same function to the left and right side of lookup.
# I'm searching people with usernames longer than mine
qs = User.objects.filter(username__length__gt=my_username)
The same keyword arguments that can be specified in Func.as_sql(..., function=..., template=..., arg_joiner=...) can be specified already in Func.__init__() if not overwritten in custom as_sql() or they can be set as attributes of a custom descendant class of Func.
Many SQL database functions have a verbose syntax like POSITION(substring IN string) because it simplifies readability if named parameters are not supported like POSITION($1 IN $2) and a brief variant STRPOS(string, substring) (por postgres) or INSTR(string, substring) (for other databases) that is easier implemented by Func() and the readability is fixed by the Python wrapper with __init__(expression, substring).
Also very complicated functions can be implemented by a combination of more nested functions with simple arguments safe way: Case(When(field_name=lookup_value, then=Value(value)), When(...),... default=Value(value)).

Usually, what leaves you vulnerable to an SQL injection attack are the "stray" single quotes '.
Everything contained between a single quote pair will be processed as it should, but an unpaired single quote may end the string and allow the rest of the entry to act as executable piece of code.
That is exactly the case on #hynekcer's example.
Django provides the Value method to prevent the above:
The value will be added into the SQL parameter list and properly quoted.
So if you make sure to pass every user input through the Value method you will be fine:
from django.db.models import Value
search = user_input
qs = A.objects.filter(title__icontains=search)
.annotate(pos=Position('title', Value(search)))
.order_by('pos').values_list('title', flat=True)
EDIT:
As stated in the comments, that doesn't seem to work as expected in the above setting. But if the call is as follows it works:
pos=Func(F('title'), Value(search), function='INSTR')
As a side note: Why mess with the templates in the first place?
You can find the function you want to use from any database language (ex: SQLite, PostgreSQL, MySQL etc) and use it explicitly:
class Position(Func):
function = 'POSITION' # MySQL default in your example
def as_sqlite(self, compiler, connection):
return self.as_sql(compiler, connection, function='INSTR')
def as_postgresql(self, compiler, connection):
return self.as_sql(compiler, connection, function='STRPOS')
...
EDIT:
You can use other functions (like the LOWER function) inside a Func call as follows:
pos=Func(Lower(F('title')), Lower(Value(search)), function='INSTR')

basis on the John Moutafis ideas, final function is (inside the __init__ method we use Values for safety result.)
from django.db.models import Func, F, Value
from django.db.models.functions import Lower
class Instr(Func):
function = 'INSTR'
def __init__(self, string, substring, insensitive=False, **extra):
if not substring:
raise ValueError('Empty substring not allowed')
if not insensitive:
expressions = F(string), Value(substring)
else:
expressions = Lower(string), Lower(Value(substring))
super(Instr, self).__init__(*expressions)
def as_postgresql(self, compiler, connection):
return self.as_sql(compiler, connection, function='STRPOS')

Related

where to define a slug without a model.py in Django

I am rather new to django, and am looking at where to define a slug in django when creating a backend without models. the url is created as such:
url(r'^main/(?P<slug>[-\w]+)/', include('main.urls')),
I have slugs within my main.urls which I define inside of each view function. Im not exactly sure where to define this slug(link, whatever you may call it). On other django slug examples, the common way is in a model, and I am currently talking to a program rather then creating my own models.
Would this be in the urls.py, or views.py (in the project, not app)?
Thank you so much. Hopefully this is understandable.
It's not hard. Really.
In url-configs each entry is simply a regular expression which has to match a url that is visited by an end user. r'^main/(?P<slug>[-\w]+)/' will for example match with: http://localhost:8000/main/some-slug/
You can use a special kind of syntax in your regular expression to extract matched data and pass that data as a variable to your view function.
The bit that does that is (?P<slug>[-\w]+) it puts matched words (in this case a slug) into a variable called slug (the <slug> part, it defines the variable name). In this humble example the slug variable will be set to "some-slug".
The variable will be accessible in your view like this:
from django.http import HttpResponse
def handle_my_view(request, slug=homepage):
# do stuff with slug
return HttpResponse("I did stuff with slug: {}".format(slug))
Learn more about, and fiddle with regular expressions
At http://www.regexr.com
But why do i see slugs used in models?:
A slug (or named variable, coming from a url 'interception') can be used for anything. Commonly the slug variable itself will be used to retrieve a database record of some sorts... And that involves using models.
You can do whatever you want with them; add stuff, subtract stuff, capitalize, whatever. The sky is the limit.
From the Django docs:
https://docs.djangoproject.com/en/1.10/topics/http/urls/#named-groups
Named groups
The above example used simple, non-named regular-expression groups (via parenthesis) to capture bits of the URL and pass them as positional arguments to a view. In more advanced usage, it’s possible to use named regular-expression groups to capture URL bits and pass them as keyword arguments to a view.
In Python regular expressions, the syntax for named regular-expression groups is (?Ppattern), where name is the name of the group and pattern is some pattern to match.
Here’s the above example URLconf, rewritten to use named groups:
from django.conf.urls import url
from . import views
urlpatterns = [
url(r'^articles/2003/$', views.special_case_2003),
url(r'^articles/(?P<year>[0-9]{4})/$', views.year_archive),
url(r'^articles/(?P<year>[0-9]{4})/(?P<month>[0-9]{2})/$', views.month_archive),
url(r'^articles/(?P<year>[0-9]{4})/(?P<month>[0-9]{2})/(?P<day>[0-9]{2})/$', views.article_detail),
]
This accomplishes exactly the same thing as the previous example, with one subtle difference: The captured values are passed to view functions as keyword arguments rather than positional arguments. For example:
A request to /articles/2005/03/ would call the function views.month_archive(request, year='2005', month='03'), instead of views.month_archive(request, '2005', '03').
A request to /articles/2003/03/03/ would call the function views.article_detail(request, year='2003', month='03', day='03').
In practice, this means your URLconfs are slightly more explicit and less prone to argument-order bugs – and you can reorder the arguments in your views’ function definitions. Of course, these benefits come at the cost of brevity; some developers find the named-group syntax ugly and too verbose.

Building Django Q() objects from other Q() objects, but with relation crossing context

I commonly find myself writing the same criteria in my Django application(s) more than once. I'll usually encapsulate it in a function that returns a Django Q() object, so that I can maintain the criteria in just one place.
I will do something like this in my code:
def CurrentAgentAgreementCriteria(useraccountid):
'''Returns Q that finds agent agreements that gives the useraccountid account current delegated permissions.'''
AgentAccountMatch = Q(agent__account__id=useraccountid)
StartBeforeNow = Q(start__lte=timezone.now())
EndAfterNow = Q(end__gte=timezone.now())
NoEnd = Q(end=None)
# Now put the criteria together
AgentAgreementCriteria = AgentAccountMatch & StartBeforeNow & (NoEnd | EndAfterNow)
return AgentAgreementCriteria
This makes it so that I don't have to think through the DB model more than once, and I can combine the return values from these functions to build more complex criterion. That works well so far, and has saved me time already when the DB model changes.
Something I have realized as I start to combine the criterion from these functions that is that a Q() object is inherently tied to the type of object .filter() is being called on. That is what I would expect.
I occasionally find myself wanting to use a Q() object from one of my functions to construct another Q object that is designed to filter a different, but related, model's instances.
Let's use a simple/contrived example to show what I mean. (It's simple enough that normally this would not be worth the overhead, but remember that I'm using a simple example here to illustrate what is more complicated in my app.)
Say I have a function that returns a Q() object that finds all Django users, whose username starts with an 'a':
def UsernameStartsWithAaccount():
return Q(username__startswith='a')
Say that I have a related model that is a user profile with settings including whether they want emails from us:
class UserProfile(models.Model):
account = models.OneToOneField(User, unique=True, related_name='azendalesappprofile')
emailMe = models.BooleanField(default=False)
Say I want to find all UserProfiles which have a username starting with 'a' AND want use to send them some email newsletter. I can easily write a Q() object for the latter:
wantsEmails = Q(emailMe=True)
but find myself wanting to something to do something like this for the former:
startsWithA = Q(account=UsernameStartsWithAaccount())
# And then
UserProfile.objects.filter(startsWithA & wantsEmails)
Unfortunately, that doesn't work (it generates invalid PSQL syntax when I tried it).
To put it another way, I'm looking for a syntax along the lines of Q(account=Q(id=9)) that would return the same results as Q(account__id=9).
So, a few questions arise from this:
Is there a syntax with Django Q() objects that allows you to add "context" to them to allow them to cross relational boundaries from the model you are running .filter() on?
If not, is this logically possible? (Since I can write Q(account__id=9) when I want to do something like Q(account=Q(id=9)) it seems like it would).
Maybe someone suggests something better, but I ended up passing the context manually to such functions. I don't think there is an easy solution, as you might need to call a whole chain of related tables to get to your field, like table1__table2__table3__profile__user__username, how would you guess that? User table could be linked to table2 too, but you don't need it in this case, so I think you can't avoid setting the path manually.
Also you can pass a dictionary to Q() and a list or a dictionary to filter() functions which is much easier to work with than using keyword parameters and applying &.
def UsernameStartsWithAaccount(context=''):
field = 'username__startswith'
if context:
field = context + '__' + field
return Q(**{field: 'a'})
Then if you simply need to AND your conditions you can combine them into a list and pass to filter:
UserProfile.objects.filter(*[startsWithA, wantsEmails])

hybrid property with join in sqlalchemy

I have probably not grasped the use of #hybrid_property fully. But what I try to do is to make it easy to access a calculated value based on a column in another table and thus a join is required.
So what I have is something like this (which works but is awkward and feels wrong):
class Item():
:
#hybrid_property
def days_ago(self):
# Can I even write a python version of this ?
pass
#days_ago.expression
def days_ago(cls):
return func.datediff(func.NOW(), func.MAX(Event.date_started))
This requires me to add the join on the Action table by the caller when I need to use the days_ago property. Is the hybrid_property even the correct approach to simplifying my queries where I need to get hold of the days_ago value ?
One way or another you need to load or access Action rows either via join or via lazy load (note here it's not clear what Event vs. Action is, I'm assuming you have just Item.actions -> Action).
The non-"expression" version of days_ago intends to function against Action objects that are relevant only to the current instance. Normally within a hybrid, this means just iterating through Item.actions and performing the operation in Python against loaded Action objects. Though in this case you're looking for a simple aggregate you could instead opt to run a query, but again it would be local to self so this is like object_session(self).query(func.datediff(...)).select_from(Action).with_parent(self).scalar().
The expression version of the hybrid when formed against another table typically requires that the query in which it is used already have the correct FROM clauses set up, so it would look like session.query(Item).join(Item.actions).filter(Item.days_ago == xyz). This is explained at Join-Dependent Relationship Hybrid.
your expression here might be better produced as a column_property, if you can afford using a correlated subquery. See that at http://docs.sqlalchemy.org/en/latest/orm/mapping_columns.html#using-column-property-for-column-level-options.

icontains and SQL Security

I have a web app that allows users to enter a search query which will then retrieve models that match this search criteria. Here are my methods:
#staticmethod
def searchBody(query):
'''
Return all entries whose body text contains the query.
'''
return Entry.objects.get(text__icontains=query)
#staticmethod
def searchTitle(query):
'''
Return all entries whose title text contains the query.
'''
return Entry.objects.get(title__icontains=query)
#staticmethod
def searchAuthor(query):
'''
Return all entries whose author text contains the query.
'''
return Entry.objects.get(author.icontains=query)
My question is simply: is this secure as it stands? In other words, does incontains perform the necessary string escaping operations so a person can't inject SQL or Python code into the query to launch an attack?
Yes, the Django ORM protects you against SQL injection.
Of course you can never be entirely sure that there is no security vulnerability in an application. Nevertheless, the ORM is the component responsible for protecting you against SQL injection, so you should assume it's safe and keep your django install up to date!
On an unrelated note, there is a typo in Entry.objects.get(author.icontains=query).
Also, using .get is going to throw a lot of errors here (whenever the object doesn't exist, or more than one exist). It doesn't do what your docstring says either.
You probably want to be using .filter instead.

How can you keep the Django ORM from making mistakes when you pass the wrong kind of object?

We found this while testing, one machine was setup with MyISAM as the default engine and one was set with InnoDB as the default engine. We have code similar to the following
class StudyManager(models.Manager):
def scored(self, school=None, student=None):
qset = self.objects.all()
if school:
qset = qset.filter(school=school)
if student:
qset = qset.filter(student=student)
return qset.order_by('something')
The problem code looked like this:
print Study.objects.scored(student).count()
which meant that the "student" was being treated as a school. This got thru testing in with MyISAM because student.id == school.id because MyISAM can't do a rollback and gets completely re-created each test (resetting the autoincrement id field). InnoDB caught these errors because rollback evidently does not reset the autoincrement fields.
Problem is, during testing, there could be many other errors that are going uncaught due to duck typing since all models have an id field. I'm worried about the id's on objects lining up (in production or in testing) and that causing problems/failing to find the bugs.
I could add asserts like so:
class StudyManager(models.Manager):
def scored(self, school=None, student=None):
qset = self.objects.all()
if school:
assert(isinstance(school, School))
qset = qset.filter(school=school)
if student:
assert(isinstance(student, Student))
qset = qset.filter(student=student)
return qset.order_by('something')
But this looks nasty, and is a lot of work (to go back and retrofit). It's also slower in debug mode.
I've thought about the idea that the id field for the models could be coerced into model_id (student_id for Student, school_id for School) so that schools would not have a student_id, this would only involve specifying the primary key field, but django has a shortcut for that in .pk so I'm guessing that might not help in all cases.
Is there a more elegant solution to catching this kind of bug? Being an old C++ hand, I kind of miss type safety.
This is an aspect of Python and has nothing to do with Django per se.
By defining default values for function parameters you do not eliminate the concept of positional arguments — you simply make it possible to not specify all parameters when invoking the function. #mVChr is correct in saying that you need to get in the habit of using the parameter name(s) when you call the routine, particularly when there is inherent ambiguity in just what it is being called with.
You might also consider having two separate routines whose names quiet clearly identify their expected parameter types.

Categories