Difference between different ways of updating a single row - python

I'm looking for the "standard" way of modifying a single row from a given table. (I want to modify several values within the row.)
I've found many ways and I struggle understanding the impact of this choice. This answer is an example of what I'm not looking for but also an example of the many ways of doing the same thing.
SQLAlchemy seems to be a well though tool so I guess there aren't several way just as a design consequence, there must be a benefit/cost for every solution. I'm looking for this information but I can't find it anywhere.
For the sake of the example, consider I want to update the profile of a given user with ID user_id. I can for example write the two following statements that (in appearance) the same effect:
profile = {'display_name': 'test', 'age': 34}
# 1)
user_profile = query.get(user_id)
for key, value in profile.items():
setattr(user_profile, key, value)
database.session.add(user_profile)
# 2)
user_profile = query.filter_by(id=user_id)
user_profile.update({**profile})
# 3)
with database.engine.connect() as conn:
stmt = (
sqlalchemy.update(models.UserProfile)
.values(**profile)
.where(models.UserProfile.id == db_user_id)
)
conn.execute(stmt)
In general what are the performances impact of all the different ways of updating a single row using SQLAlchemy?

Related

How to filter on calculated column of a query and meanwhile preserve mapped entities

I have a query which selects an entity A and some calculated fields
q = session.query(Recipe,func.avg(Recipe.somefield).join(.....)
I then use what I select in a way which assumes I can subscript result with "Recipe" string:
for entry in q.all():
recipe=entry.Recipe # Access KeyedTuple by Recipe attribute
...
Now I need to wrap my query in an additional select, say to filter by calculated field AVG:
q=q.subquery();
q=session.query(q).filter(q.c.avg_1 > 1)
And now I cannot access entry.Recipe anymore!
Is there a way to make SQLAlchemy adapt a query to an enclosing one, like aliased(adapt_on_names=True) orselect_from_entity()`?
I tried using those but was given an error
As Michael Bayer mentioned in a relevant Google Group thread, such adaptation is already done via Query.from_self() method. My problem was that in this case I didn't know how to refer a column which I want to filter on
This is due to the fact, that it is calculated i.e. there is no table to refer to!
I might resort to using literals(.filter('avg_1>10')), but 'd prefer to stay in the more ORM-style
So, this is what I came up with - an explicit column expression
row_number_column = func.row_number().over(
partition_by=Recipe.id
).label('row_number')
query = query.add_column(
row_number_column
)
query = query.from_self().filter(row_number_column == 1)

Form a variable from a string without dictionary

This is sort of a duplicate, maybe, but I haven't found a solution that fits my needs.
I have a function which builds django-tables2. I pass in a parameter of a model_class as a string. Right now that can be 'application', 'database', or 'item', but there will eventually be many more. In order to configure the table, I set:
table = DatabaseTable(assets)
I would like to be able to pass in the model_class so that the table is dynamic based on the type of input it's getting. Something like:
table = model_class + Table(assets) (pseudo)
I don't want to hardcode the dictionary as I'm not sure how many values will eventually be added. I see people suggesting using exec, but those suggestions are all followed by other users threatening to murder those people's families.
This has been asked many times on stack and the answer is always the same: when you want an association between names and objects, the one obvious way to do it is with a dictionary.
I don't want to hardcode the dictionary as I'm not sure how many
values will eventually be added.
So you want to avoid a situation like this
lookup = {'application': ApplicationTable,
'database': DatabaseTable,
'item': ItemTable}
Because you don't want to have to maintain the lookup, right? That's fine, but it is not necessary to hardcode a dictionary like this in order to dynamically retrieve the objects.
If your eval is working as you've said it is above, then these names in scope like ApplicationTable, Itemtable etc must be coming from somewhere. Since I don't know where they are being defined and where new ones are going to appear, I will assume they are being imported from some module called MyTables
from MyTables import ApplicationTable, Itemtable, BlahTable ...
The trick here is to use the dict (read: namespace) from the module itself.
import MyTables
my_string = 'item'
table = getattr(MyTables, my_string.capitalize() + 'Table')
Is this allowed? It works.
table = eval(model_name.capitalize() + 'Table(assets)')
RequestConfig(request).configure(table)

Query syntax to select exactly one item for each category

class Category(models.Model):
pass
class Item(models.Model):
cat = models.ForeignKey(Category)
I want to select exactly one item for each category, which is the query syntax for do this?
Your question isn't entirely clear: since you didn't say otherwise, I'm going to assume that you don't care which item is selected for each category, just that you need any one. If that isn't the case, please update the question to clarify.
tl;dr version: there is no documented
way to explicitly use GROUP BY
statements in Django, except by using
a raw query. See the bottom for code to do so.
The problem is that in doing what you're looking for in SQL itself requires a bit of a hack. You can easily try this example with by entering sqlite3 :memory: at the command line:
CREATE TABLE category
(
id INT
);
CREATE TABLE item
(
id INT,
category_id INT
);
INSERT INTO category VALUES (1);
INSERT INTO category VALUES (2);
INSERT INTO category VALUES (3);
INSERT INTO item VALUES (1,1);
INSERT INTO item VALUES (2,2);
INSERT INTO item VALUES (3,3);
INSERT INTO item VALUES (4,1);
INSERT INTO item VALUES (5,2);
SELECT id, category_id, COUNT(category_id) FROM item GROUP BY category_id;
returns
4|1|2
5|2|2
3|3|1
Which is what you're looking for (one item id for each category id), albeit with an extraneous COUNT. The count (or some other aggregate function) is needed in order to apply the GROUP BY.
Note: this will ignore categories that don't contain any items, which seems like sensible behaviour.
Now the question becomes, how to do this in Django?
The obvious answer is to use Django's aggregation/annotation support, in particular, combining annotate with values as is recommend elsewhere to GROUP queries in Django.
Reading those posts, it would seem we could accomplish what we're looking for with
Item.objects.values('id').annotate(unneeded_count=Count('category_id'))
However this doesn't work. What Django does here is not just GROUP BY "category_id", but groups by all fields selected (ie GROUP BY "id", "category_id")1. I don't believe there is a way (in the public API, at least) to change this behaviour.
The solution is to fall back to raw SQL:
qs = Item.objects.raw('SELECT *, COUNT(category_id) FROM myapp_item GROUP BY category_id')
1: Note that you can inspect what queries Django is running with:
from django.db import connection
print connection.queries[-1]
Edit:
There are a number of other possible approaches, but most have (possibly severe) performance problems. Here are a couple:
1. Select an item from each category.
items = []
for c in Category.objects.all():
items.append(c.item_set[0])
This is a more clear and flexible approach, but has the obvious disadvantage of requiring many more database hits.
2. Use select_related
items = Item.objects.select_related()
and then do the grouping/filtering yourself (in Python).
Again, this is perhaps more clear than using raw SQL and only requires one query, but this one query could be very large (it will return all items and their categories) and doing the grouping/filtering yourself is probably less efficient than letting the database do it for you.

Batch select with SQLAlchemy

I have a large set of values V, some of which are likely to exist in a table T. I would like to insert into the table those which are not yet inserted. So far I have the code:
for value in values:
s = self.conn.execute(mytable.__table__.select(mytable.value == value)).first()
if not s:
to_insert.append(value)
I feel like this is running slower than it should. I have a few related questions:
Is there a way to construct a select statement such that you provide a list (in this case, 'values') to which sqlalchemy responds with records which match that list?
Is this code overly expensive in constructing select objects? Is there a way to construct a single select statement, then parameterize at execution time?
For the first question, something like this if I understand your question correctly
mytable.__table__.select(mytable.value.in_(values)
For the second question, querying this by 1 row at a time is overly expensive indeed, although you might not have a choice in the matter. As far as I know there is no tuple select support in SQLAlchemy so if there are multiple variables (think polymorhpic keys) than SQLAlchemy can't help you.
Either way, if you select all matching rows and insert the difference you should be done :)
Something like this should work:
results = self.conn.execute(mytable.__table__.select(mytable.value.in_(values))
available_values = set(row.value for row in results)
to_insert = set(values) - available_values

In Django, how does one filter a QuerySet with dynamic field lookups?

Given a class:
from django.db import models
class Person(models.Model):
name = models.CharField(max_length=20)
Is it possible, and if so how, to have a QuerySet that filters based on dynamic arguments? For example:
# Instead of:
Person.objects.filter(name__startswith='B')
# ... and:
Person.objects.filter(name__endswith='B')
# ... is there some way, given:
filter_by = '{0}__{1}'.format('name', 'startswith')
filter_value = 'B'
# ... that you can run the equivalent of this?
Person.objects.filter(filter_by=filter_value)
# ... which will throw an exception, since `filter_by` is not
# an attribute of `Person`.
Python's argument expansion may be used to solve this problem:
kwargs = {
'{0}__{1}'.format('name', 'startswith'): 'A',
'{0}__{1}'.format('name', 'endswith'): 'Z'
}
Person.objects.filter(**kwargs)
This is a very common and useful Python idiom.
A simplified example:
In a Django survey app, I wanted an HTML select list showing registered users. But because we have 5000 registered users, I needed a way to filter that list based on query criteria (such as just people who completed a certain workshop). In order for the survey element to be re-usable, I needed for the person creating the survey question to be able to attach those criteria to that question (don't want to hard-code the query into the app).
The solution I came up with isn't 100% user friendly (requires help from a tech person to create the query) but it does solve the problem. When creating the question, the editor can enter a dictionary into a custom field, e.g.:
{'is_staff':True,'last_name__startswith':'A',}
That string is stored in the database. In the view code, it comes back in as self.question.custom_query . The value of that is a string that looks like a dictionary. We turn it back into a real dictionary with eval() and then stuff it into the queryset with **kwargs:
kwargs = eval(self.question.custom_query)
user_list = User.objects.filter(**kwargs).order_by("last_name")
Additionally to extend on previous answer that made some requests for further code elements I am adding some working code that I am using
in my code with Q. Let's say that I in my request it is possible to have or not filter on fields like:
publisher_id
date_from
date_until
Those fields can appear in query but they may also be missed.
This is how I am building filters based on those fields on an aggregated query that cannot be further filtered after the initial queryset execution:
# prepare filters to apply to queryset
filters = {}
if publisher_id:
filters['publisher_id'] = publisher_id
if date_from:
filters['metric_date__gte'] = date_from
if date_until:
filters['metric_date__lte'] = date_until
filter_q = Q(**filters)
queryset = Something.objects.filter(filter_q)...
Hope this helps since I've spent quite some time to dig this up.
Edit:
As an additional benefit, you can use lists too. For previous example, if instead of publisher_id you have a list called publisher_ids, than you could use this piece of code:
if publisher_ids:
filters['publisher_id__in'] = publisher_ids
Django.db.models.Q is exactly what you want in a Django way.
This looks much more understandable to me:
kwargs = {
'name__startswith': 'A',
'name__endswith': 'Z',
***(Add more filters here)***
}
Person.objects.filter(**kwargs)
A really complex search forms usually indicates that a simpler model is trying to dig it's way out.
How, exactly, do you expect to get the values for the column name and operation?
Where do you get the values of 'name' an 'startswith'?
filter_by = '%s__%s' % ('name', 'startswith')
A "search" form? You're going to -- what? -- pick the name from a list of names? Pick the operation from a list of operations? While open-ended, most people find this confusing and hard-to-use.
How many columns have such filters? 6? 12? 18?
A few? A complex pick-list doesn't make sense. A few fields and a few if-statements make sense.
A large number? Your model doesn't sound right. It sounds like the "field" is actually a key to a row in another table, not a column.
Specific filter buttons. Wait... That's the way the Django admin works. Specific filters are turned into buttons. And the same analysis as above applies. A few filters make sense. A large number of filters usually means a kind of first normal form violation.
A lot of similar fields often means there should have been more rows and fewer fields.

Categories