icontains and SQL Security - python

I have a web app that allows users to enter a search query which will then retrieve models that match this search criteria. Here are my methods:
#staticmethod
def searchBody(query):
'''
Return all entries whose body text contains the query.
'''
return Entry.objects.get(text__icontains=query)
#staticmethod
def searchTitle(query):
'''
Return all entries whose title text contains the query.
'''
return Entry.objects.get(title__icontains=query)
#staticmethod
def searchAuthor(query):
'''
Return all entries whose author text contains the query.
'''
return Entry.objects.get(author.icontains=query)
My question is simply: is this secure as it stands? In other words, does incontains perform the necessary string escaping operations so a person can't inject SQL or Python code into the query to launch an attack?

Yes, the Django ORM protects you against SQL injection.
Of course you can never be entirely sure that there is no security vulnerability in an application. Nevertheless, the ORM is the component responsible for protecting you against SQL injection, so you should assume it's safe and keep your django install up to date!
On an unrelated note, there is a typo in Entry.objects.get(author.icontains=query).
Also, using .get is going to throw a lot of errors here (whenever the object doesn't exist, or more than one exist). It doesn't do what your docstring says either.
You probably want to be using .filter instead.

Related

Where to put 'create_function' statements in Django

I'm working on a project with standard Django+SQLite bundle.
And I've come to situation where Django querySet API and even django 'raw()' are not enough for me to retrieve appropriate information from the database.
So I execute custom SQL directly to retrieve data. Also I need to define 'lower' function through 'create_function' interface cause SQLite can't perform case-insensitive sorting on unicode field (https://www.sqlite.org/faq.html#q18).
My question is where should I put this 'create_function' statement? Is it normal to put it directly into a django view, so it would be executed every time the view is loaded, or I should put it somewere else, where it would be executed only once?
from django.db import connection
def lower(text):
if text:
return text.lower()
else:
return text
def my_view(request):
...
with connection.cursor() as cursor:
if (connection.vendor == 'sqlite'):
connection.connection.create_function('lower', 1, lower)
cursor.execute('SELECT * FROM <my complicated select using lower>;')
data = dictfetchall(cursor)
...

Django custom for complex Func (sql function)

In the process of finding a solution for Django ORM order by exact, I created a custom django Func:
from django.db.models import Func
class Position(Func):
function = 'POSITION'
template = "%(function)s(LOWER('%(substring)s') in LOWER(%(expressions)s))"
template_sqlite = "instr(lower(%(expressions)s), lower('%(substring)s'))"
def __init__(self, expression, substring):
super(Position, self).__init__(expression, substring=substring)
def as_sqlite(self, compiler, connection):
return self.as_sql(compiler, connection, template=self.template_sqlite)
which works as follows:
class A(models.Model):
title = models.CharField(max_length=30)
data = ['Port 2', 'port 1', 'A port', 'Bport', 'Endport']
for title in data:
A.objects.create(title=title)
search = 'port'
qs = A.objects.filter(
title__icontains=search
).annotate(
pos=Position('title', search)
).order_by('pos').values_list('title', flat=True)
# result is
# ['Port 2', 'port 1', 'Bport', 'A port', 'Endport']
But as #hynekcer commented:
"It crashes easily by ') in '') from myapp_suburb; drop ...
expected that the name of the app is "myapp and autocommit is enabled."
The main problem is that extra data (substring) got into the template without sqlescape which leaves the app vulnerable to SQL injection attacks.
I cannot find which is the Django way to protect from that.
I created a repo (djposfunc) where you can test any solution.
TL;DR:
All examples with Func() in Django docs can be easily used to safely implement other similar SQL functions with one argument.
All builtin Django database fuctions and conditional functions that are descendants of Func() are also safe by design. Application beyond this limit needs comment.
The class Func() is the most general part of Django Query expressions. It allows to implement almost any function or operator into Django ORM some way. It is like a Swiss Army knife, very universal, but one must be little more attentive to not cut himself, than with a specialized tool (like an electric cutter with optical barrier). It is still much more secure then to forge an own tool by hammer from piece of iron, if once an "upgraded" "secure" pocket knife does not fit into pocket.
Security notes
The short documentation for Func(*expressions, **extra) with examples should be read first. (I recommend here the development docs for Django 2.0 where is recently added more security information, including Avoiding SQL injection, related exactly to your example.)
All positional arguments in *expressions are compiled by Django, that is Value(string) are moved to parameters, where they are correctly escaped by database driver.
Other strings are interpreted as field names F(name), then prefixed by right table_name. alias dot, eventually a join to that table is added and names are treated by quote_name() function.
The problem is that the documentation in 1.11 is still simple, the seductive parameters **extra and **extra_context are documented vaguely. They can be used only for simple parameters that will be never "compiled" and never go through SQL params. Numbers or simple strings with safe characters without apostrophe, backslash or percent are good. It can not be a field name, because it will be not unambiguous, neither joined. It is safe for previously checked numbers and fixed strings like 'ASC'/'DESC', timezone names and other values like from a drop down list. There is still a weak point. Drop down list values must be checked at the server side. Also numbers must be verified that they are numbers, not a numeric string like '2' because all database functions silently accept an omitted numeric string instead of number. If a false "number" is passed '0) from my_app.my_table; rogue_sql; --' then the injection is over. Note that the rogue string doesn't contain any very prohibitive character in this case. User supplied numbers must be checked specifically or the value must be passed through positional expressions.
It is safe to specify function name and arg_joiner string attributes of Func class or the same function and arg_joiner parameters of Func() call. The template parameter should never contain apostrophes around substituted parameter expressions inside parentheses: ( %(expressions)s ), because apostrophes are added by the database driver if necessary, but additional apostrophes can cause that it usually doesn't work correctly, but sometimes it could be overlooked and that would lead to another security issue.
Notes not related to security
Many simple builtin functions with one argument do not look as simple as possible because they are derived from multi-purpose descendants of Func. For example Length is a function that can be used also as lookup Transform.
class Length(Transform):
"""Return the number of characters in the expression."""
function = 'LENGTH'
output_field = fields.IntegerField() # sometimes specified the type
# lookup_name = 'length' # useful for lookup not for Func usage
Lookup transformation applies the same function to the left and right side of lookup.
# I'm searching people with usernames longer than mine
qs = User.objects.filter(username__length__gt=my_username)
The same keyword arguments that can be specified in Func.as_sql(..., function=..., template=..., arg_joiner=...) can be specified already in Func.__init__() if not overwritten in custom as_sql() or they can be set as attributes of a custom descendant class of Func.
Many SQL database functions have a verbose syntax like POSITION(substring IN string) because it simplifies readability if named parameters are not supported like POSITION($1 IN $2) and a brief variant STRPOS(string, substring) (por postgres) or INSTR(string, substring) (for other databases) that is easier implemented by Func() and the readability is fixed by the Python wrapper with __init__(expression, substring).
Also very complicated functions can be implemented by a combination of more nested functions with simple arguments safe way: Case(When(field_name=lookup_value, then=Value(value)), When(...),... default=Value(value)).
Usually, what leaves you vulnerable to an SQL injection attack are the "stray" single quotes '.
Everything contained between a single quote pair will be processed as it should, but an unpaired single quote may end the string and allow the rest of the entry to act as executable piece of code.
That is exactly the case on #hynekcer's example.
Django provides the Value method to prevent the above:
The value will be added into the SQL parameter list and properly quoted.
So if you make sure to pass every user input through the Value method you will be fine:
from django.db.models import Value
search = user_input
qs = A.objects.filter(title__icontains=search)
.annotate(pos=Position('title', Value(search)))
.order_by('pos').values_list('title', flat=True)
EDIT:
As stated in the comments, that doesn't seem to work as expected in the above setting. But if the call is as follows it works:
pos=Func(F('title'), Value(search), function='INSTR')
As a side note: Why mess with the templates in the first place?
You can find the function you want to use from any database language (ex: SQLite, PostgreSQL, MySQL etc) and use it explicitly:
class Position(Func):
function = 'POSITION' # MySQL default in your example
def as_sqlite(self, compiler, connection):
return self.as_sql(compiler, connection, function='INSTR')
def as_postgresql(self, compiler, connection):
return self.as_sql(compiler, connection, function='STRPOS')
...
EDIT:
You can use other functions (like the LOWER function) inside a Func call as follows:
pos=Func(Lower(F('title')), Lower(Value(search)), function='INSTR')
basis on the John Moutafis ideas, final function is (inside the __init__ method we use Values for safety result.)
from django.db.models import Func, F, Value
from django.db.models.functions import Lower
class Instr(Func):
function = 'INSTR'
def __init__(self, string, substring, insensitive=False, **extra):
if not substring:
raise ValueError('Empty substring not allowed')
if not insensitive:
expressions = F(string), Value(substring)
else:
expressions = Lower(string), Lower(Value(substring))
super(Instr, self).__init__(*expressions)
def as_postgresql(self, compiler, connection):
return self.as_sql(compiler, connection, function='STRPOS')

Page.query.filter(Page.url.contains(url)) function definition?

While reading someone's code in Python/Flask, I came across the line:
results = Page.query.filter(Page.url.contains(url))
I've searched it but can't get a satisfying answer. What do the functions query.filter and url.contains do exactly and what values they return under different conditions like if there are no matches or multiple matches or table doesn't exist. Is page the name of the table or just the name of the class?
Edit: Function in which the line is used
#pages.route('/<url>/', methods=('GET', 'POST'))
def url_view(url):
from app import get_locale
page = DataGetter.get_page_by_url('/' + url, get_locale())
return render_template('gentelella/guest/page.html', page=page)
def get_page_by_url(url, selected_language=False):
if selected_language:
results = Page.query.filter_by(language=selected_language).filter(Page.url.contains(url))
else:
results = Page.query.filter(Page.url.contains(url))
if results:
return results.first()
return results
I think you need to do a bit more research on your own, but to get you started this looks like a Flask-SQLAlchemy project which is a Flask around the SQLAlchemy library. The documentation you need to read is mostly in SQLAlchemy and Flask-SQLAlchemy.
Page appears to be a ORM mapping of a database table. The filter/filter_by methods encapsulates the SQL SELECT functionality. The contains method encapsulates the SQL LIKE %<OTHER>% functionality.

python database implementation

I am trying to implement a simple database program in python. I get to the point where I have added elements to the db, changed the values, etc.
class db:
def __init__(self):
self.database ={}
def dbset(self, name, value):
self.database[name]=value
def dbunset(self, name):
self.dbset(name, 'NULL')
def dbnumequalto(self, value):
mylist = [v for k,v in self.database.items() if v==value]
return mylist
def main():
mydb=db()
cmd=raw_input().rstrip().split(" ")
while cmd[0]!='end':
if cmd[0]=='set':
mydb.dbset(cmd[1], cmd[2])
elif cmd[0]=='unset':
mydb.dbunset(cmd[1])
elif cmd[0]=='numequalto':
print len(mydb.dbnumequalto(cmd[1]))
elif cmd[0]=='list':
print mydb.database
cmd=raw_input().rstrip().split(" ")
if __name__=='__main__':
main()
Now, as a next step I want to be able to do nested transactions within this python code.I begin a set of commands with BEGIN command and then commit them with COMMIT statement. A commit should commit all the transactions that began. However, a rollback should revert the changes back to the recent BEGIN. I am not able to come up with a suitable solution for this.
A simple approach is to keep a "transaction" list containing all the information you need to be able to roll-back pending changes:
def dbset(self, name, value):
self.transaction.append((name, self.database.get(name)))
self.database[name]=value
def rollback(self):
# undo all changes
while self.transaction:
name, old_value = self.transaction.pop()
self.database[name] = old_value
def commit(self):
# everything went fine, drop undo information
self.transaction = []
If you are doing this as an academic exercise, you might want to check out the Rudimentary Database Engine recipe on the Python Cookbook. It includes quite a few classes to facilitate what you might expect from a SQL engine.
Database is used to create database instances without transaction support.
Database2 inherits from Database and provides for table transactions.
Table implements database tables along with various possible interactions.
Several other classes act as utilities to support some database actions that would normally be supported.
Like and NotLike implement the LIKE operator found in other engines.
date and datetime are special data types usable for database columns.
DatePart, MID, and FORMAT allow information selection in some cases.
In addition to the classes, there are functions for JOIN operations along with tests / demonstrations.
This is all available for free in the built in sqllite module. The commits and rollbacks for sqllite are discussed in more detail than I can understand here

Is there a way to transparently perform validation on SQLAlchemy objects?

Is there a way to perform validation on an object after (or as) the properties are set but before the session is committed?
For instance, I have a domain model Device that has a mac property. I would like to ensure that the mac property contains a valid and sanitized mac value before it is added to or updated in the database.
It looks like the Pythonic approach is to do most things as properties (including SQLAlchemy). If I had coded this in PHP or Java, I would probably have opted to create getter/setter methods to protect the data and give me the flexibility to handle this in the domain model itself.
public function mac() { return $this->mac; }
public function setMac($mac) {
return $this->mac = $this->sanitizeAndValidateMac($mac);
}
public function sanitizeAndValidateMac($mac) {
if ( ! preg_match(self::$VALID_MAC_REGEX) ) {
throw new InvalidMacException($mac);
}
return strtolower($mac);
}
What is a Pythonic way to handle this type of situation using SQLAlchemy?
(While I'm aware that validation and should be handled elsewhere (i.e., web framework) I would like to figure out how to handle some of these domain specific validation rules as they are bound to come up frequently.)
UPDATE
I know that I could use property to do this under normal circumstances. The key part is that I am using SQLAlchemy with these classes. I do not understand exactly how SQLAlchemy is performing its magic but I suspect that creating and overriding these properties on my own could lead to unstable and/or unpredictable results.
You can add data validation inside your SQLAlchemy classes using the #validates() decorator.
From the docs - Simple Validators:
An attribute validator can raise an exception, halting the process of mutating the attribute’s value, or can change the given value into something different.
from sqlalchemy.orm import validates
class EmailAddress(Base):
__tablename__ = 'address'
id = Column(Integer, primary_key=True)
email = Column(String)
#validates('email')
def validate_email(self, key, address):
# you can use assertions, such as
# assert '#' in address
# or raise an exception:
if '#' not in address:
raise ValueError('Email address must contain an # sign.')
return address
Yes. This can be done nicely using a MapperExtension.
# uses sqlalchemy hooks to data model class specific validators before update and insert
class ValidationExtension( sqlalchemy.orm.interfaces.MapperExtension ):
def before_update(self, mapper, connection, instance):
"""not every instance here is actually updated to the db, see http://www.sqlalchemy.org/docs/reference/orm/interfaces.html?highlight=mapperextension#sqlalchemy.orm.interfaces.MapperExtension.before_update"""
instance.validate()
return sqlalchemy.orm.interfaces.MapperExtension.before_update(self, mapper, connection, instance)
def before_insert(self, mapper, connection, instance):
instance.validate()
return sqlalchemy.orm.interfaces.MapperExtension.before_insert(self, mapper, connection, instance)
sqlalchemy.orm.mapper( model, table, extension = ValidationExtension(), **mapper_args )
You may want to check before_update reference because not every instance here is actually updated to the db.
"It looks like the Pythonic approach is to do most things as properties"
It varies, but that's close.
"If I had coded this in PHP or Java, I would probably have opted to create getter/setter methods..."
Good. That's Pythonic enough. Your getter and setter functions are bound up in a property; that's pretty good.
What's the question?
Are you asking how to spell property?
However, "transparent validation" -- if I read your example code correctly -- may not really be all that good an idea.
Your model and your validation should probably be kept separate. It's common to have multiple validations for a single model. For some users, fields are optional, fixed or not used; this leads to multiple validations.
You'll be happier following the Django design pattern of using a Form for validation, separate form the model.

Categories