Simple query in Django - python

I'm trying to do a RAW Query like this:
User.objects.raw("SELECT username FROM app_user WHERE id != {0} AND LOWER(username) LIKE LOWER('%{1}%')".format('1','john'))
I get this error:
django.db.utils.ProgrammingError: not enough arguments for format string
The query works perfectly in SQLite but does not work in MySQL.

After you performed the formatting, Django obtains a query like:
SELECT username FROM app_user WHERE id != 1 AND LOWER(username) LIKE LOWER('%john%')
As you can see this string contains %j and %). This is part of another way to format strings in Python that Django will use to inject parameters the proper way. It thus looks for extra parameters. But it can not find any.
But regardless what happens, this is not a good idea, since such queryes are vulnerable to SQL injection. If later 'John' is replaced with '); DROP TABLE app_user -- (or something similar), then somebody can remove the entire table.
If you want to perform such query, it should look like:
User.objects.raw(
"SELECT username FROM app_user WHERE id != %s AND LOWER(username) LIKE LOWER('%%%s%%')",
['1','john']
)
Or better: use the Django ORM:
User.objects.exclude(id=1).filter(
username__icontains='john'
).values_list('username', flat=True)
Or we can encode the full query like:
User.objects.exclude(id=request.user.pk).annotate(
flname=Concat('first_name', Value(' '), 'last_name')
).filter(
Q(username__icontains=q) | Q(flname__icontains=q)
).values_list('id', 'username', 'first_name', 'last_name')
If you are after the User objects, and thus not that much the id, username, etc. columns itself, the by dropping the .values_list(..) you get the User objects, not a QuerySet of lists.

Related

How to avoid SQL Injection in Python for Upsert Query to SQL Server?

I have a sql query I'm executing that I'm passing variables into. In the current context I'm passing the parameter values in as f strings, but this query is vulnerable to sql injection. I know there is a method to use a stored procedure and restrict permissions on the user executing the query. But is there a way to avoid having to go the stored procedure route and perhaps modify this function to be secure against SQL Injection?
I have the below query created to execute within a python app.
def sql_gen(tv, kv, join_kv, col_inst, val_inst, val_upd):
sqlstmt = f"""
IF NOT EXISTS (
SELECT *
FROM {tv}
WHERE {kv} = {join_kv}
)
INSERT {tv} (
{col_inst}
)
VALUES (
{val_inst}
)
ELSE
UPDATE {tv}
SET {val_upd}
WHERE {kv} = {join_kv};
"""
engine = create_engine(f"mssql+pymssql://{username}:{password}#{server}/{database}")
connection = engine.raw_connection()
cursor = connection.cursor()
cursor.execute(sqlstmt)
connection.commit()
cursor.close()
Fortunately, most database connectors have query parameters in which you pass the variable instead of giving in the string inside the query yourself for the risks you mentioned.
You can read more on this here: https://realpython.com/prevent-python-sql-injection/#understanding-python-sql-injection
Example:
# Vulnerable
cursor.execute("SELECT admin FROM users WHERE username = '" + username + '");
# Safe
cursor.execute("SELECT admin FROM users WHERE username = %s'", (username, ));
As Amanzer mentions correctly in his reply Python has mechanisms to pass parameters safely.
However, there are other elements in your query (table names and column names) that are not supported as parameters (bind variables) because JDBC does not support those.
If these are from an untrusted source (or may be in the future) you should be sure you validate these elements. This is a good coding practice to do even if you are sure.
There are some options to do this safely:
You should limit your tables and columns based on positive validation - make sure that the only values allowed are the ones that are authorized
If that's not possible (because these are user created?):
You should make sure tables or column names limit the
names to use a "safe" set of characters (alphanumeric & dashes,
underscores...)
You should enquote the table names / column names -
adding double quotes around the objects. If you do this, you need to
be careful to validate there are no quotes in the name, and error out
or escape the quotes. You also need to be aware that adding quotes
will make the name case sensitive.

How to select only date part in postgres by using django ORM

My backend setting as following
Postgres 9.5
Python 2.7
Django 1.9
I have a table with datetime type, which named createdAt. I want to use Django ORM to select this field with only date part and group by createdAt filed.
For example, this createdAt filed may store 2016-12-10 00:00:00+0、2016-12-11 10:10:05+0、2016-12-11 17:10:05+0 ... etc。
By using Djanggo ORM, the output should be 2016-12-10、2016-12-11。The corresponding sql should similar to: SELECT date(createdAt) FROM TABLE GROUP BY createdAt.
Thanks.
You can try that:
use __date operator to Filter by DateTimeField with date part: createAt__date, this is similar to __lt or __gt.
use annotate/Func/F to create an extra field base on createAt with only showing the date part.
use values and annotate, Count/Sum to create group by query.
Example code
from django.db import models
from django.db.models import Func, F, Count
class Post(models.Model):
name = models.CharField('name', max_length=255)
createAt = models.DateTimeField('create at', auto_now=True)
Post.objects.filter(createAt__date='2016-12-26') # filter the date time field for specify date part
.annotate(createAtDate=Func(F('createAt'), function='date')) # create an extra field, named createAtDate to hold the date part of the createAt field, this is similar to sql statement: date(createAt)
.values('createAtDate') # group by createAtDate
.annotate(count=Count('id')) # aggregate function, like count(id) in sql statement
output
[{'count': 2, 'createAtDate': datetime.date(2016, 12, 26)}]
Notice:
To specify each method's functionality, I have break the long statement into serval pieces, so you need to remove the carriage return when you paste it to your code.
When you have turned on timezone in your application, like USE_TZ = True, you need be more carefully when compare two dates in django. Timezone is matter when you making query as above.
Hope it would help. :)
Django provide Manager.raw() to perform raw queries on model but as raw query must contains primary key that is not useful in your case. In case Manager.raw() is not quite enough you might need to perform queries that don’t map cleanly to models, you can always access the database directly, routing around the model layer entirely by using connection. connection object represent the default database connection. So you can perform you query like.
from django.db import connection
#acquire cursor object by calling connection.cursor()
cursor = connection.cursor()
cursor.execute('SELECT created_at from TABLE_NAME GROUP BY created_at')
After executing query you can iterate over cursor object to get query result.

Handle multiple parameter in SQL query using MySQLdb

I have a scenario where I need to exclude few thousands of email id with specific domain name.
My current query is
Select * from users where email NOT LIKE '%abc.com'
AND email NOT LIKE '%efg.com'
AND email NOT LIKE '%xyz.com'
When I moved to python, I wrote a query like
MySQLcursor.execute (Select * from users where email NOT LIKE '%abc.com'
AND email NOT LIKE '%efg.com'
AND email NOT LIKE '%xyz.com' )
Can I make a generic list of domains and exclude them?
What I tried is
list_of_domains = ('%%abc.com','%%xyz.com','%%efg.com')
MySQLcursor.execute (Select * from users where email NOT LIKE %(exclude_domain)s, {"exclude_domain":list_of_domains} )
It seems to work if there is only 1 value in list_of_domains. Because, when it unpacks the list it can only match "email Not Like" condition with 1 list of domain.
How can I make a generic query, so that if tomorrow, If I have new domains, I simply add that to list_of_domains and it works fine.
I am not sure if it is possible? Can somebody help?
I don't like constructing queries using string interpolation and concatenation, but, assuming you cannot change the table schema and this is the query you have to do and you trust the source of the domain list, here is something to get you started:
domains = [...]
pattern = "email NOT LIKE '%{0}'"
conditions = " AND ".join([pattern.format(domain) for domain in domains])
query = "SELECT * FROM users WHERE " + conditions

Django ORM limiting queryset to only return a subset of data

I have the following query in a Django app. The user field is a foreign key. The results may contain 1000 MyModel objects, but only for a handful of users. I'd like to limit it to 5 MyModel objects returned per user in the user__in= portion of the query. I should end up with 5*#users or less MyModel objects.
lfs = MyModel.objects.filter(
user__in=[some,users,here,],
active=True,
follow=True,
)
Either through the ORM or SQL (using Postgres) would be acceptable.
Thanks
EDIT 2
Found a simpler way to get this done, which I've added as an answer below.
EDIT
Some of the links mentioned in the comments had some good information, although none really worked with Postgres or the Django ORM. For anyone else looking for this information in the future my adaptation of the code in those other questions/asnwers is here.
To implement this is postgres 9.1, I had to create a couple functions using pgperl (which also required me to install pgperl)
CREATE OR REPLACE FUNCTION set_int_var(name text, val bigint) RETURNS bigint AS $$
if ($_SHARED{$_[0]} = $_[1]) {
return $_[1];
} else {
return $_[1];
}
$$ LANGUAGE plperl;
CREATE OR REPLACE FUNCTION get_int_var(name text) RETURNS bigint AS $$
return $_SHARED{$_[0]};
$$ LANGUAGE plperl;
And my final query looks something like the following
SELECT x.id, x.ranking, x.active, x.follow, x.user_id
FROM (
SELECT tbl.id, tbl.active, tbl.follow, tbl.user_id,
CASE WHEN get_int_var('user_id') != tbl.user_id
THEN
set_int_var('rownum', 1)
ELSE
set_int_var('rownum', get_int_var('rownum') + 1)
END AS
ranking,
set_int_var('user_id', tbl.user_id)
FROM my_table AS tbl
WHERE tbl.active = TRUE AND tbl.follow=TRUE
ORDER BY tbl.user_id
) AS x
WHERE x.ranking <= 5
ORDER BY x.user_id
LIMIT 50
The only downside to this is that if I try to limit the users that it looks for by using user_id IN (), the whole thing breaks and it just returns every row, rather than just 5 per user.
This is what ended up working, and allowed me to only select a handful of users, or all users (by removing the AND mt.user_id IN () line).
SELECT * FROM mytable
WHERE (id, user_id, follow, active) IN (
SELECT id, likeable, user_id, follow, active FROM mytable mt
WHERE mt.user_id = mytable.user_id
AND mt.user_id IN (1, 2)
ORDER BY user_id LIMIT 5)
ORDER BY likeable
I think this is what you where looking for (i didn't see it in other posts):
https://docs.djangoproject.com/en/dev/topics/db/queries/#limiting-querysets
In other examples, they pass from queryset to list before "slicing". If you make something like this (for example):
lfs = MyModel.objects.filter(
user__in=[some,users,here,],
active=True,
follow=True,
)[:10]
the resulting SQL it's a query with LIMIT 10 in it's clauses.
So, the query you are looking for would be something like this:
mymodel_ids = []
for user in users:
mymodel_5ids_for_user = (MyModel.objects.filter(
user=user,
active=True,
follow=True,
)[:5]).values_list('id', flat=True)
mymodel_ids.extend(mymodel_5ids_for_user)
lfs = MyModel.objects.filter(id__in=mymodel_ids)
having in lfs the objects of MyModel you where looking for (5 entries per user).
I think the number of queries is, at least, one per user and one to retrieve all MyModel objects with that filter.
Be aware of the order you want to filter the objects. If you change the order of "mymodel_5ids_for_user" query, the first 5 elements of the query could change.

django/python: raw sql with multiple tables

I need to perform a raw sql on multiple tables. I then render the result set. For one table I would do:
sql = "select * from my_table"
results = my_table.objects.raw(sql)
For multiple tables I am doing:
sql = "select * from my_table, my_other_table where ...."
results = big_model.objects.raw(sql)
But, do I really need to create a table/model/class big_model, which contains all fields that I may need? I will never actually store any data in this "table".
ADDED:
I have a table my_users. I have a table my_listings. These are defined in Models.py. The table my_listings has a foreign key to my_users, indicating who created the listing.
The SQL is
"select user_name, listing_text from my_listings, my_users where my_users.id = my_listings.my_user_id".
I want this SQL to generate a result set that I can use to render my page in django.
The question is: Do I have to create a model that contains the fields user_name and listing_text? Or is there some better way that still uses raw SQL (select, from, where)? Of course, my actual queries are more complicated than this example. (The models that I define in models.py become actual tables in the database hence the use of the model/table term. Not sure how else to refer to them, sorry.) I use raw sql because I found that python table references only work with simple data models.
This works. Don't know why it didn't before :( From Dennis Baker's comment:
You do NOT need to have a model with all the fields in it, you just need the first model and fields from that. You do need to have the fields with unique names and as far as I know you should use "tablename.field as fieldname" to make sure you have all unique fields. I've done some fairly complex queries with 5+ tables this way and always tie them back to a single model. –
2 . Another solution is to use a cursor. However, a cursor has to be changed from a list of tuples to a list of dictionaries. I'm sure there are cleaner ways using iterators, but this function works. It takes a string, which is the raw sql query, and returns a list which can be rendered and used in a template.
from django.db import connection, transaction
def sql_select(sql):
cursor = connection.cursor()
cursor.execute(sql)
results = cursor.fetchall()
list = []
i = 0
for row in results:
dict = {}
field = 0
while True:
try:
dict[cursor.description[field][0]] = str(results[i][field])
field = field +1
except IndexError as e:
break
i = i + 1
list.append(dict)
return list
you do not need a model that includes the fields that you want to return from your raw sql. If you happen to have a model that actually has the fields that you want to return from your raw sql then you can map your raw sql output to this model, otherwise you can use cursors to go around models altogether.

Categories