Is it possible to control which columns are queried in the query method of SQLAlchemy, while still returning instances of the object you are querying (albeit partially populated)?
Or is it necessary for SQLAlchemy to perform a SELECT * to map to an object?
(I do know that querying individual columns is available, but it does not map the result to an object, only to a component of a named tuple).
For example, if the User object has the attributes userid, name, password, and bio, but you want the query to only fill in userid and name for the objects it returns:
# hypothetical syntax, of course:
for u in session.query(User.columns[userid, name]).all():
print u
would print:
<User(1, 'bob', None, None)>
<User(2, 'joe', None, None)>
...
Is this possible; if so, how?
A simple solution that worked for me was:
users = session.query(User.userid, User.name)
for user in users:
print user
would print:
<User(1, 'bob')>
<User(2, 'joe')>
...
you can query for individual columns, which returns named tuples that do in fact act pretty much like your mapped object if you're just passing off to a template or something:
http://www.sqlalchemy.org/docs/orm/tutorial.html#querying
or you can establish various columns on the mapped class as "deferred", either configurationally or using options:
http://docs.sqlalchemy.org/en/latest/orm/loading_columns.html#deferred-column-loading
there's an old ticket in trac for something called "defer_everything_but()", if someone felt like providing tests and such there's no reason that couldn't be a feature add, here's a quick version:
from sqlalchemy.orm import class_mapper, defer
def defer_everything_but(entity, cols):
m = class_mapper(entity)
return [defer(k) for k in
set(p.key for p
in m.iterate_properties
if hasattr(p, 'columns')).difference(cols)]
s = Session()
print s.query(A).options(*defer_everything_but(A, ["q", "p"]))
defer() should really accept multiples, added ticket #2250 for that (edit: as noted in the comment this is in 0.9 as load_only())
Latest doc for load_only is here
http://docs.sqlalchemy.org/en/latest/orm/loading_columns.html#load-only-cols
If you're looking at a way to control that at model definition level, use deferred
http://docs.sqlalchemy.org/en/latest/orm/loading_columns.html#deferred
Related
I commonly find myself writing the same criteria in my Django application(s) more than once. I'll usually encapsulate it in a function that returns a Django Q() object, so that I can maintain the criteria in just one place.
I will do something like this in my code:
def CurrentAgentAgreementCriteria(useraccountid):
'''Returns Q that finds agent agreements that gives the useraccountid account current delegated permissions.'''
AgentAccountMatch = Q(agent__account__id=useraccountid)
StartBeforeNow = Q(start__lte=timezone.now())
EndAfterNow = Q(end__gte=timezone.now())
NoEnd = Q(end=None)
# Now put the criteria together
AgentAgreementCriteria = AgentAccountMatch & StartBeforeNow & (NoEnd | EndAfterNow)
return AgentAgreementCriteria
This makes it so that I don't have to think through the DB model more than once, and I can combine the return values from these functions to build more complex criterion. That works well so far, and has saved me time already when the DB model changes.
Something I have realized as I start to combine the criterion from these functions that is that a Q() object is inherently tied to the type of object .filter() is being called on. That is what I would expect.
I occasionally find myself wanting to use a Q() object from one of my functions to construct another Q object that is designed to filter a different, but related, model's instances.
Let's use a simple/contrived example to show what I mean. (It's simple enough that normally this would not be worth the overhead, but remember that I'm using a simple example here to illustrate what is more complicated in my app.)
Say I have a function that returns a Q() object that finds all Django users, whose username starts with an 'a':
def UsernameStartsWithAaccount():
return Q(username__startswith='a')
Say that I have a related model that is a user profile with settings including whether they want emails from us:
class UserProfile(models.Model):
account = models.OneToOneField(User, unique=True, related_name='azendalesappprofile')
emailMe = models.BooleanField(default=False)
Say I want to find all UserProfiles which have a username starting with 'a' AND want use to send them some email newsletter. I can easily write a Q() object for the latter:
wantsEmails = Q(emailMe=True)
but find myself wanting to something to do something like this for the former:
startsWithA = Q(account=UsernameStartsWithAaccount())
# And then
UserProfile.objects.filter(startsWithA & wantsEmails)
Unfortunately, that doesn't work (it generates invalid PSQL syntax when I tried it).
To put it another way, I'm looking for a syntax along the lines of Q(account=Q(id=9)) that would return the same results as Q(account__id=9).
So, a few questions arise from this:
Is there a syntax with Django Q() objects that allows you to add "context" to them to allow them to cross relational boundaries from the model you are running .filter() on?
If not, is this logically possible? (Since I can write Q(account__id=9) when I want to do something like Q(account=Q(id=9)) it seems like it would).
Maybe someone suggests something better, but I ended up passing the context manually to such functions. I don't think there is an easy solution, as you might need to call a whole chain of related tables to get to your field, like table1__table2__table3__profile__user__username, how would you guess that? User table could be linked to table2 too, but you don't need it in this case, so I think you can't avoid setting the path manually.
Also you can pass a dictionary to Q() and a list or a dictionary to filter() functions which is much easier to work with than using keyword parameters and applying &.
def UsernameStartsWithAaccount(context=''):
field = 'username__startswith'
if context:
field = context + '__' + field
return Q(**{field: 'a'})
Then if you simply need to AND your conditions you can combine them into a list and pass to filter:
UserProfile.objects.filter(*[startsWithA, wantsEmails])
Is there a more elegant/faster way to do this?
Here are my models:
class X(Model):
(...)
class A(Model):
xs = ManyToManyField(X)
class B(A):
(...)
class C(A): # NOTE: not relevant here
(...)
class Y(Model):
b = ManyToOneField(B)
Here is what I want to do:
def foo(x):
# NOTE: b's pk (b.a_ptr) is actually a's pk (a.id)
return Y.objects.filter(b__in=x.a_set.all())
But it returns the following error:
<repr(<django.db.models.query.QuerySet at 0x7f6f2c159a90>) failed:
django.core.exceptions.FieldError: Cannot resolve keyword u'a_ptr'
into field. Choices are: ...enumerate all fields of A...>
And here is what I'm doing right now in order to minimize the queries:
def foo(x):
a_set = x.a_set
a_set.model = B # NOTE: force the model to be B
return Y.filter(b__in=a_set.all())
It works but it's not one line. It would be cool to have something like this:
def foo(x):
return Y.filter(b__in=x.a_set._as(B).all())
The simpler way in Django seems to be the following:
def foo(x):
return Y.filter(b__in=B.objects.filter(pk__in=x.a_set.all()))
...but this makes useless sub-queries in SQL:
SELECT Y.* FROM Y WHERE Y.b IN (
SELECT B.a_ptr FROM B WHERE B.a_ptr IN (
SELECT A.id FROM A WHERE ...));
This is what I want:
SELECT Y.* FROM Y WHERE Y.b IN (SELECT A.id FROM A WHERE ...);
(This SQL example is a bit simplified because the relation between A and X is actually a ManyToMany table, I substituted this with SELECT A.id FROM A WHERE ... for clarity sake.)
You might get a better result if you follow the relationship one more step, rather than using in.
Y.objects.filter(b__xs=x)
I know next to nothing about databases, but there are many warnings in the Django literature about concrete inheritance class B(A) causing database inefficiency because of repeated INNER JOIN operations.
The underlying problem is that relational databases aren't object-orientated.
If you use Postgres, then Django has native support (from 1.8) for Hstore fields, which are searchable but schema-less collections of keyword-value pairs. I intend to be using one of these in my base class (your A) to eliminate any need for derived classes (your B and C). Instead, I'll maintain a type field in A (equal to "B" or "C" in your example) and use that and/or careful key presence checking to access subfields of the Hstore field which are "guaranteed" to exist if (say) instance.type=="C" (which constraint isn't enforced at the DB integrity level).
I also discovered the django-hstore module which existed prior to Django 1.8, and which has even greater functionality. I've yet to do anything with it so cannot comment further.
I also found reference to wide tables in the literature, where you simply define a model A with a potentially large number of fields which contain useful data only if the object is of type B, and more yet fields for types C,D,... I don't have the DB knowledge to assess the cost of lots of blank or null fields attached to every row of a table. Clearly if you have a million sheep records and one pedigree-racehorse record and one stick-insect record, as "subclasses" of Animal in the same wide table, it gets silly.
I'd appreciate feedback on either idea if you have the DB understanding that I lack.
Oh, and for completeness, there's the django-polymorphic module, but that builds on concrete inheritance with its attendant inefficiencies.
Sorry for noobster question again.
But I'm trying to do some very easy stuff here, and I don't know how. Documentation gives me hints which do not work, or apply.
I recieve a POST request and grab a variable out of it. It says "name".
I have to search all over my entities Object (for example) and find out if there's one that has the same name. Is there's none, I must create a new Entity with this name. Easy it may look, but I keep Failing.
Would really appreciate any help.
My code currently is this one:
objects_qry = Object.query(Object.name == data["name"])
if (not objects_qry ):
obj = Object()
obj .name = data["name"]
obj .put()
class Object(ndb.Model):
name = ndb.StringProperty()
Using a query to perform this operation is really inefficient.
In addition your code is possibly unreliable, if name doesn't exist and you have two requests at the same time for name you could end up with two records. And you can't tell because your query only returns the first entity with the name property equal to some value.
Because you expect only one entity for name a query is expensive and inefficient.
So you have two choices you can use get_or_insert or just do a get, and if you have now value create a new entity.
Any way here is a couple of code samples using the name as part of the key.
name = data['name']
entity = Object.get_or_insert(name)
or
entity = Object.get_by_id(name)
if not entity:
entity = Object(id=name)
entity.put()
Calling .query just creates a query object, it doesn't execute it, so trying to evaluate is as a boolean is wrong. Query object have methods, fetch and get that, respectively, return a list of matching entities, or just one entity.
So your code could be re-written:
objects_qry = Object.query(Object.name == data["name"])
existing_object = objects_qry.get()
if not existing_object:
obj = Object()
obj.name = data["name"]
obj.put()
That said, Tim's point in the comments about using the ID instead of a property makes sense if you really care about names being unique - the code above wouldn't stop two simultaneous requests from creating entities with the same name.
I have these two simple models, A and B:
from django.db import models
class A(models.Model):
name = models.CharField(max_length=10)
class B(A):
age = models.IntegerField()
Now, how can I query for all instances of A which do not have an instance of B?
The only way I found requires an explicitly unique field on each subclass, which is NOT NULL, so that I can do A.objects.filter(b__this_is_a_b=None), for example, to get instances that are not also B instances. I'm looking for a way to do this without adding an explicit silly flag like that.
I also don't want to query for all of the objects and then filter them in Python. I want to get the DB to do it for me, which is basically something like SELECT * FROM A WHERE A.id in (SELECT id from B)
Since some version of django or python this works as well:
A.Objects.all().filter(b__isnull=True)
because if a is an A object a.b gives the subclass B of a when it exists
I Know this is an old question, but my answer might help new searchers on this subject.
see also:
multi table inheritance
And one of my own questions about this: downcasting a super class to a sub class
I'm not sure it's possible to do this purely in the DB with Django's
ORM, in a single query. Here's the best I've been able to do:
A.objects.exclude(id__in=[r[0] for r in B.objects.values_list("a_ptr_id")])
This is 2 DB queries, and works best with a simplistic inheritance
graph - each subclass of A would require a new database query.
Okay, it took a lot of trial and error, but I have a solution. It's
ugly as all hell, and the SQL is probably worse than just going with
two queries, but you can do something like so:
A.objects.exclude(b__age__isnull=True).exclude(b__age_isnull=False)
There's no way to get Django to do the join without referencing a
field on b. But with these successive .exclude()s, you make any A
with a B subclass match one or the other of the excludes. All you're left with are A's
without a B subclass.
Anyway, this is an interesting use case, you should bring it up on django-dev...
I don't work with django, but it looks like you want the isinstance(obj, type) built-in python method.
Edit:
Would A.objects.exclude(id__exact=B__id) work?
I have a method in my Python code that returns a tuple - a row from a SQL query. Let's say it has three fields: (jobId, label, username)
For ease of passing it around between functions, I've been passing the entire tuple as a variable called 'job'. Eventually, however, I want to get at the bits, so I've been using code like this:
(jobId, label, username) = job
I've realised, however, that this is a maintenance nightmare, because now I can never add new fields to the result set without breaking all of my existing code. How should I have written this?
Here are my two best guesses:
(jobId, label, username) = (job[0], job[1], job[2])
...but that doesn't scale nicely when you have 15...20 fields
or to convert the results from the SQL query to a dictionary straight away and pass that around (I don't have control over the fact that it starts life as a tuple, that's fixed for me)
#Staale
There is a better way:
job = dict(zip(keys, values))
I'd say that a dictionary is definitely the best way to do it. It's easily extensible, allows you to give each value a sensible name, and Python has a lot of built-in language features for using and manipulating dictionaries. If you need to add more fields later, all you need to change is the code that converts the tuple to a dictionary and the code that actually makes use of the new values.
For example:
job={}
job['jobid'], job['label'], job['username']=<querycode>
This is an old question, but...
I'd suggest using a named tuple in this situation: collections.namedtuple
This is the part, in particular, that you'd find useful:
Subclassing is not useful for adding new, stored fields. Instead, simply create a new named tuple type from the _fields attribute.
Perhaps this is overkill for your case, but I would be tempted to create a "Job" class that takes the tuple as its constructor argument and has respective properties on it. I'd then pass instances of this class around instead.
I would use a dictionary. You can convert the tuple to a dictionary this way:
values = <querycode>
keys = ["jobid", "label", "username"]
job = dict([[keys[i], values [i]] for i in xrange(len(values ))])
This will first create an array [["jobid", val1], ["label", val2], ["username", val3]] and then convert that to a dictionary. If the result order or count changes, you just need to change the list of keys to match the new result.
PS still fresh on Python myself, so there might be better ways off doing this.
An old question, but since no one mentioned it I'll add this from the Python Cookbook:
Recipe 81252: Using dtuple for Flexible Query Result Access
This recipe is specifically designed for dealing with database results, and the dtuple solution allows you to access the results by name OR index number. This avoids having to access everything by subscript which is very difficult to maintain, as noted in your question.
With a tuple it will always be a hassle to add or change fields. You're right that a dictionary will be much better.
If you want something with slightly friendlier syntax you might want to take a look at the answers this question about a simple 'struct-like' object. That way you can pass around an object, say job, and access its fields even more easily than a tuple or dict:
job.jobId, job.username = jobId, username
If you're using the MySQLdb package, you can set up your cursor objects to return dicts instead of tuples.
import MySQLdb, MySQLdb.cursors
conn = MySQLdb.connect(..., cursorclass=MySQLdb.cursors.DictCursor)
cur = conn.cursor() # a DictCursor
cur2 = conn.cursor(cursorclass=MySQLdb.cursors.Cursor) # a "normal" tuple cursor
How about this:
class TypedTuple:
def __init__(self, fieldlist, items):
self.fieldlist = fieldlist
self.items = items
def __getattr__(self, field):
return self.items[self.fieldlist.index(field)]
You could then do:
j = TypedTuple(["jobid", "label", "username"], job)
print j.jobid
It should be easy to swap self.fieldlist.index(field) with a dictionary lookup later on... just edit your __init__ method! Something like Staale does.