Passing a value into Sqlite3 python - python

I am trying to pass a method parameter into an sqlite command and I am having problems.
I have tried several different methods including:
c.execute('''SELECT ?, count(*)
FROM posts
GROUP BY 1''', (var,))
c.execute('''SELECT :variable, count(*)
FROM posts
GROUP BY 1''', {"variable" : var})
I have looked at the python docs on this and believe I am following them.
Both these methods don't select the columns, but return
[('lang', 3284469)]
Where lang is the name of the column and the value of the variable being passed. It should look more like
[('en', 3289)]
[('es', 845)]
....
[('ze', 39)]
The only method I have been able to get working is:
c.execute('''SELECT '''+var+''', count(*)
FROM posts
GROUP BY 1''')
I am not particularly happy with this, is there a better way to do this? What am I doing wrong? Thanks

That is not the way parameterized requests work. You should only use parameters for values, not for identifiers. That means that request parameters should not be column names, nor table names, and occur only in the where or having clause.
So if you want the column name or the table name to be variable, you must dynamically build the request string, and when you want values in the where clause to be dynamic, you use parameters:
c.execute('''SELECT '''+var+''', count(*) as total
FROM posts
GROUP BY 1
HAVING total > ?''', (minval,))

Related

Why does COUNT(*) in my SQL query return several values? How to get one single value for total rows found?

I have a Django class getting some order data from a PostreSQL database. I need to get the number of rows found in a query.
Trying to get the found rows count from the following query with COUNT(*):
When I print the result from the query above, I get a lot of data:
I only want to get a single number, the count of the total rows found and loaded by the select query above. How do I achieve this?
Keep in mind that I'm pretty new to SQL, so I might be missing something obvious to you.
Thanks!
COUNT(expr) will return a count of the number of non-null values of expr in the rows that are retrieved by the SELECT.
In your case, you're grouping a bunch togheter and it returns the count for each grouped result row.
You'll probably get the result you're looking for, wrapping it in a subselect.
For example:
SELECT COUNT(*)
FROM (SELECT o.* FROM your_table o ....)

What is the Django query for the below SQL statement?

There is a simple SQL table with 3 columns: id, sku, last_update
and a very simple SQL statement: SELECT DISTINCT sku FROM product_data ORDER BY last_update ASC
What would be a django view code for the aforesaid SQL statement?
This code:
q = ProductData.objects.values('sku').distinct().order_by('sku')
returns 145 results
whereas this statement:
q = ProductData.objects.values('sku').distinct().order_by('last_update')
returns over 1000 results
Why is it so? Can someone, please, help?
Thanks a lot in advance!
The difference is that in the first query the result is a list of (sku)s, in the second is a list of (sku, last_update)s, this because any fields included in the order_by, are also included in the SQL SELECT, thus the distinct is applied to a different set or records, resulting in a different count.
Take a look to the queries Django generates, they should be something like the followings:
Query #1
>>> str(ProductData.objects.values('sku').distinct().order_by('sku'))
'SELECT DISTINCT "yourproject_productdata"."sku" FROM "yourproject_productdata" ORDER BY "yourproject_productdata"."sku" ASC'
Query #2
>>> str(ProductData.objects.values('sku').distinct().order_by('last_update'))
'SELECT DISTINCT "yourproject_productdata"."sku", "yourproject_productdata"."last_update" FROM "yourproject_productdata" ORDER BY "yourproject_productdata"."last_update" ASC'
This behaviour is described in the distinct documentation
Any fields used in an order_by() call are included in the SQL SELECT
columns. This can sometimes lead to unexpected results when used in
conjunction with distinct(). If you order by fields from a related
model, those fields will be added to the selected columns and they may
make otherwise duplicate rows appear to be distinct. Since the extra
columns don’t appear in the returned results (they are only there to
support ordering), it sometimes looks like non-distinct results are
being returned.
Similarly, if you use a values() query to restrict the columns
selected, the columns used in any order_by() (or default model
ordering) will still be involved and may affect uniqueness of the
results.
The moral here is that if you are using distinct() be careful about
ordering by related models. Similarly, when using distinct() and
values() together, be careful when ordering by fields not in the
values() call.

Python mysql using variable to select a certain field

Having a little tricky issue with python and mysql. To keep it simple, the following code returns whatever is in the variable 'field', which is a string. Such as 'username' or 'password'.
options = [field, userID]
entries = cursor.execute('select (?) from users where id=(?)', options).fetchall()
print(entries);
This code works correctly if I remove the first (?) and just use the actually name (like 'username') instead. Can anyone provide some input?
Your query is actually formed as:
select "field" from users where id="value"
which returns you a string "field" instead of the actual table field value.
You cannot parameterize column and table names (docs):
Parameter placeholders can only be used to insert column values. They
can not be used for other parts of SQL, such as table names,
statements, etc.
Use string formatting for that part:
options = [userID]
query = 'select {field} from users where id=(?)'.format(field=field)
cursor.execute(query, options).fetchall()
Related threads with some more explanations:
pysqlite: Placeholder substitution for column or table names?
Python MySQLdb: Query parameters as a named dictionary

SqlAlchemy select with max, group_by and order_by

I have to list the last modified resources for each group, for that I can do this query:
model.Session.query(
model.Resource, func.max(model.Resource.last_modified)
).group_by(model.Resource.resource_group_id).order_by(
model.Resource.last_modified.desc())
But SqlAlchemy complains with:
ProgrammingError: (ProgrammingError) column "resource.id" must appear in
the GROUP BY clause or be used in an aggregate function
How I can select only resource_group_id and last_modified columns?
In SQL what I want is this:
SELECT resource_group_id, max(last_modified) AS max_1
FROM resource GROUP BY resource_group_id ORDER BY max_1 DESC
model.Session.query(
model.Resource.resource_group_id, func.max(model.Resource.last_modified)
).group_by(model.Resource.resource_group_id).order_by(
func.max(model.Resource.last_modified).desc())
You already got it, but I'll try to explain what's going on with the original query for future reference.
In sqlalchemy if you specified query(model.Resource, ...), a model reference, it will list each column on the resource table in the generated SQL select statement, so your original query would look something like:
SELECT resource.resource_group_id AS resource_group_id,
resource.extra_column1 AS extra_column1,
resource.extra_column2 AS extra_column2,
...
count(resource.resource_group_id) AS max_1
GROUP BY resource_group_id ORDER BY max_1 DESC;
This won't work with a GROUP BY.
A common way to avoid this is to specify what columns you want to select explicitly by adding them to the query method .query(model.Resource.resource_group_id)

Query syntax to select exactly one item for each category

class Category(models.Model):
pass
class Item(models.Model):
cat = models.ForeignKey(Category)
I want to select exactly one item for each category, which is the query syntax for do this?
Your question isn't entirely clear: since you didn't say otherwise, I'm going to assume that you don't care which item is selected for each category, just that you need any one. If that isn't the case, please update the question to clarify.
tl;dr version: there is no documented
way to explicitly use GROUP BY
statements in Django, except by using
a raw query. See the bottom for code to do so.
The problem is that in doing what you're looking for in SQL itself requires a bit of a hack. You can easily try this example with by entering sqlite3 :memory: at the command line:
CREATE TABLE category
(
id INT
);
CREATE TABLE item
(
id INT,
category_id INT
);
INSERT INTO category VALUES (1);
INSERT INTO category VALUES (2);
INSERT INTO category VALUES (3);
INSERT INTO item VALUES (1,1);
INSERT INTO item VALUES (2,2);
INSERT INTO item VALUES (3,3);
INSERT INTO item VALUES (4,1);
INSERT INTO item VALUES (5,2);
SELECT id, category_id, COUNT(category_id) FROM item GROUP BY category_id;
returns
4|1|2
5|2|2
3|3|1
Which is what you're looking for (one item id for each category id), albeit with an extraneous COUNT. The count (or some other aggregate function) is needed in order to apply the GROUP BY.
Note: this will ignore categories that don't contain any items, which seems like sensible behaviour.
Now the question becomes, how to do this in Django?
The obvious answer is to use Django's aggregation/annotation support, in particular, combining annotate with values as is recommend elsewhere to GROUP queries in Django.
Reading those posts, it would seem we could accomplish what we're looking for with
Item.objects.values('id').annotate(unneeded_count=Count('category_id'))
However this doesn't work. What Django does here is not just GROUP BY "category_id", but groups by all fields selected (ie GROUP BY "id", "category_id")1. I don't believe there is a way (in the public API, at least) to change this behaviour.
The solution is to fall back to raw SQL:
qs = Item.objects.raw('SELECT *, COUNT(category_id) FROM myapp_item GROUP BY category_id')
1: Note that you can inspect what queries Django is running with:
from django.db import connection
print connection.queries[-1]
Edit:
There are a number of other possible approaches, but most have (possibly severe) performance problems. Here are a couple:
1. Select an item from each category.
items = []
for c in Category.objects.all():
items.append(c.item_set[0])
This is a more clear and flexible approach, but has the obvious disadvantage of requiring many more database hits.
2. Use select_related
items = Item.objects.select_related()
and then do the grouping/filtering yourself (in Python).
Again, this is perhaps more clear than using raw SQL and only requires one query, but this one query could be very large (it will return all items and their categories) and doing the grouping/filtering yourself is probably less efficient than letting the database do it for you.

Categories