I'm a newcomer to SQLAlchemy ORM and I'm struggling to accomplish complex-ish queries on multiple tables - queries which I find relatively straightforward to do in Doctrine DQL.
I have data objects of Cities, which belong to Countries. Some Cities also have a County ID set, but not all. As well as the necessary primary and foreign keys, each record also has a text_string_id, which links to a TextStrings table which stores the name of the City/County/Country in different languages. The TextStrings MySQL table looks like this:
CREATE TABLE IF NOT EXISTS `text_strings` (
`id` INT UNSIGNED NOT NULL,
`language` VARCHAR(2) NOT NULL,
`text_string` varchar(255) NOT NULL,
PRIMARY KEY (`id`, `language`)
)
I want to construct a breadcrumb for each city, of the form:
country_en_name > city_en_name OR
country_en_name > county_en_name > city_en_name,
depending on whether or not a County attribute is set for this city. In Doctrine this would be relatively straightforward:
$query = Doctrine_Query::create()
->select('ci.id, CONCAT(cyts.text_string, \'> \', IF(cots.text_string is not null, CONCAT(cots.text_string, \'> \', \'\'), cits.text_string) as city_breadcrumb')
->from('City ci')
->leftJoin('ci.TextString cits')
->leftJoin('ci.Country cy')
->leftJoin('cy.TextString cyts')
->leftJoin('ci.County co')
->leftJoin('co.TextString cots')
->where('cits.language = ?', 'en')
->andWhere('cyts.language = ?', 'en')
->andWhere('(cots.language = ? OR cots.language is null)', 'en');
With SQLAlchemy ORM, I'm struggling to achieve the same thing. I believe I've setup the objects correctly - in the form eg:
class City(Base):
__tablename__ = "cities"
id = Column(Integer, primary_key=True)
country_id = Column(Integer, ForeignKey('countries.id'))
text_string_id = Column(Integer, ForeignKey('text_strings.id'))
county_id = Column(Integer, ForeignKey('counties.id'))
text_strings = relation(TextString, backref=backref('cards', order_by=id))
country = relation(Country, backref=backref('countries', order_by=id))
county = relation(County, backref=backref('counties', order_by=id))
My problem is in the querying - I've tried various approaches to generating the breadcrumb but nothing seems to work. Some observations:
Perhaps using things like CONCAT and IF inline in the query is not very pythonic (is it even possible with the ORM?) - so I've tried performing these operations outside SQLAlchemy, in a Python loop of the records. However here I've struggled to access the individual fields - for example the model accessors don't seem to go n-levels deep, e.g. City.counties.text_strings.language doesn't exist.
I've also experimented with using tuples - the closest I've got to it working was by splitting it out into two queries:
# For cities without a county
for city, country in session.query(City, Country).\
filter(Country.id == City.country_id).\
filter(City.county_id == None).all():
if city.text_strings.language == 'en':
# etc
# For cities with a county
for city, county, country in session.query(City, County, Country).\
filter(and_(City.county_id == County.id, City.country_id == Country.id)).all():
if city.text_strings.language == 'en':
# etc
I split it out into two queries because I couldn't figure out how to make the Suit join optional in just the one query. But this approach is of course terrible and worse the second query didn't work 100% - it wasn't joining all of the different city.text_strings for subsequent filtering.
So I'm stumped! Any help you can give me setting me on the right path for performing these sorts of complex-ish queries in SQLAlchemy ORM would be much appreciated.
The mapping for Suit is not present but based on the propel query I would assume it has a text_strings attribute.
The relevant portion of SQLAlchemy documentation describing aliases with joins is at:
http://www.sqlalchemy.org/docs/orm/tutorial.html#using-aliases
generation of functions is at:
http://www.sqlalchemy.org/docs/core/tutorial.html#functions
cyts = aliased(TextString)
cits = aliased(TextString)
cots = aliased(TextString)
cy = aliased(Suit)
co = aliased(Suit)
session.query(
City.id,
(
cyts.text_string + \
'> ' + \
func.if_(cots.text_string!=None, cots.text_string + '> ', cits.text_string)
).label('city_breadcrumb')
).\
outerjoin((cits, City.text_strings)).\
outerjoin((cy, City.country)).\
outerjoin((cyts, cy.text_strings)).\
outerjoin((co, City.county))\
outerjoin((cots, co.text_string)).\
filter(cits.langauge=='en').\
filter(cyts.langauge=='en').\
filter(or_(cots.langauge=='en', cots.language==None))
though I would think its a heck of a lot simpler to just say:
city.text_strings.text_string + " > " + city.country.text_strings.text_string + " > " city.county.text_strings.text_string
If you put a descriptor on City, Suit:
class City(object):
# ...
#property
def text_string(self):
return self.text_strings.text_string
then you could say city.text_string.
Just for the record, here is the code I ended up using. Mike (zzzeek)'s answer stays as the correct and definitive answer because this is just an adaptation of his, which was the breakthrough for me.
cits = aliased(TextString)
cyts = aliased(TextString)
cots = aliased(TextString)
for (city_id, country_text, county_text, city_text) in \
session.query(City.id, cyts.text_string, cots.text_string, cits.text_string).\
outerjoin((cits, and_(cits.id==City.text_string_id, cits.language=='en'))).\
outerjoin((County, City.county)).\
outerjoin((cots, and_(cots.id==County.text_string_id, cots.language=='en'))).\
outerjoin((Country, City.country)).\
outerjoin((cyts, and_(cyts.id==Country.text_string_id, cyts.language=='en'))):
# Python to construct the breadcrumb, checking county_text for None-ness
Related
I've got file objects of different types, which inherit from a BaseFile, and add custom attributes, methods and maybe fields. The BaseFile also stores the File Type ID, so that the corresponding subclass model can be retrieved from any BaseFile object:
class BaseFile(models.Model):
name = models.CharField(max_length=80, db_index=True)
size= models.PositiveIntegerField()
time_created = models.DateTimeField(default=datetime.now)
file_type = models.ForeignKey(ContentType, on_delete=models.PROTECT)
class FileType1(BaseFile):
storage_path = '/path/for/filetype1/'
def custom_method(self):
<some custom behaviour>
class FileType2(BaseFile):
storage_path = '/path/for/filetype2/'
extra_field = models.CharField(max_length=12)
I also have different types of events which are associated with files:
class FileEvent(models.Model):
file = models.ForeignKey(BaseFile, on_delete=models.PROTECT)
time = models.DateTimeField(default=datetime.now)
I want to be able to efficiently get all files of a particular type which have not been involved in a particular event, such as:
unprocessed_files_type1 = FileType1.objects.filter(fileevent__isnull=True)
However, looking at the SQL executed for this query:
SELECT "app_basefile"."id", "app_basefile"."name", "app_basefile"."size", "app_basefile"."time_created", "app_basefile"."file_type_id", "app_filetype1"."basefile_ptr_id"
FROM "app_filetype1"
INNER JOIN "app_basefile"
ON("app_filetype1"."basefile_ptr_id" = "app_basefile"."id")
LEFT OUTER JOIN "app_fileevent" ON ("app_basefile"."id" = "app_fileevent"."file_id")
WHERE "app_fileevent"."id" IS NULL
It looks like this might not be very efficient because it joins on BaseFile.id instead of FileType1.basefile_ptr_id, so it will check ALL BaseFile ids to see whether they are present in FileEvent.file_id, when I only need to check the BaseFile ids corresponding to FileType1, or FileType1.basefile_ptr_ids.
This could result in a significant performance difference if there are a very large number of BaseFiles, but FileType1 is only a small subset of that, because it will be doing a large amount of unnecessary lookups.
Is there a way to force Django to join on "app_filetype1"."basefile_ptr_id" or otherwise achieve this functionality more efficiently?
Thanks for the help
UPDATE:
Using annotations and Exists subquery seems to do what I'm after, however the resulting SQL still appears strange:
unprocessed_files_type1 = FileType1.objects.annotate(file_event=Exists(FileEvent.objects.filter(file=OuterRef('pk')))).filter(file_event=False)
SELECT "app_basefile"."id", "app_basefile"."name", "app_basefile"."size", "app_basefile"."time_created", "app_basefile"."file_type_id", "app_filetype1"."basefile_ptr_id",
EXISTS(
SELECT U0."id", U0."file_id", U0."time"
FROM "app_fileevent" U0
WHERE U0."file_id" = ("app_filetype1"."basefile_ptr_id"))
AS "file_event"
FROM "app_filetype1"
INNER JOIN "app_basefile" ON ("app_filetype1"."basefile_ptr_id" = "app_basefile"."id")
WHERE EXISTS(
SELECT U0."id", U0."file_id", U0."time"
FROM "app_fileevent" U0
WHERE U0."file_id" = ("app_filetype1"."basefile_ptr_id")) = 0
It appears to be doing the WHERE EXISTS subquery twice instead of just using the annotated 'file_event' label... Maybe this is just a Django/SQLite driver bug?
I am trying to create a custom primary_key within my helpdesk/models.py that I will use to track our help desk tickets. I am in the process of writing a small ticking system for our office.
Maybe there is a better way? Right now I have:
id = models.AutoField(primary_key=True)
This increments in the datebase as; 1, 2, 3, 4....50...
I want to take this id assignment and then use it within a function to combine it with some additional information like the date, and the name, 'HELPDESK'.
The code I was using is as follows:
id = models.AutoField(primary_key=True)
def build_id(self, id):
join_dates = str(datetime.now().strftime('%Y%m%d'))
return (('HELPDESK-' + join_dates) + '-' + str(id))
ticket_id = models.CharField(max_length=15, default=(build_id(None, id)))
The idea being is that the entries in the database would be:
HELPDESK-20170813-1
HELPDESK-20170813-2
HELPDESK-20170814-3
...
HELPDESK-20170901-4
...
HELPDESK-20180101-50
...
I want to then use this as the ForeignKey to link the help desk ticket to some other models in the database.
Right now what's coming back is:
HELPDESK-20170813-<django.db.models.fields.AutoField>
This post works - Custom Auto Increment Field Django Curious if there is a better way. If not, this will suffice.
This works for me. It's a slightly modified version from Custom Auto Increment Field Django from above.
models.py
def increment_helpdesk_number():
last_helpdesk = helpdesk.objects.all().order_by('id').last()
if not last_helpdesk:
return 'HEL-' + str(datetime.now().strftime('%Y%m%d-')) + '0000'
help_id = last_helpdesk.help_num
help_int = help_id[13:17]
new_help_int = int(help_int) + 1
new_help_id = 'HEL-' + str(datetime.now().strftime('%Y%m%d-')) + str(new_help_int).zfill(4)
return new_help_id
It's called like this:
help_num = models.CharField(max_length=17, unique=True, default=increment_helpdesk_number, editable=False)
If gives you the following:
HEL-20170815-0000
HEL-20170815-0001
HEL-20170815-0002
...
The numbering doesn't start over after each day, which is something I may look at doing. The more I think about it; however, I am not sure if I even need the date there as I have a creation date field in the model already. So I may just change it to:
HEL-000000000
HEL-000000001
HEL-000000002
...
I have two models Storage and Drawers
class Storage(BaseModel):
id = PrimaryKeyField()
name = CharField()
description = CharField(null=True)
class Drawer(BaseModel):
id = PrimaryKeyField()
name = CharField()
storage = ForeignKeyField(Storage, related_name="drawers")
at the moment I'm producing json from a select query
storages = Storage.select()
As a result I have got a json array, which looks like this:
[{
description: null,
id: 1,
name: "Storage"
},
{
description: null,
id: 2,
name: "Storage 2"
}]
I know, that peewee allows to query for all drawers with storage.drawer(). But I'm struggling to include a json array to every storage which contains all drawers of that storage. I tried to use a join
storages = Storage.select(Storage, Drawer)
.join(Drawer)
.where(Drawer.storage == Storage.id)
.group_by(Storage.id)
But I just retrieve the second storage which does have drawers, but the array of drawers is not included. Is this even possible with joins? Or do I need to iterate over every storage retrieve the drawers and append them to the storage?
This is the classic O(n) query problem for ORMs. The documentation goes into some detail on various ways to approach the problem.
For this case, you will probably want prefetch(). Instead of O(n) queries, it will execute O(k) queries, one for each table involved (so 2 in your case).
storages = Storage.select().order_by(Storage.name)
drawers = Drawer.select().order_by(Drawer.name)
query = prefetch(storages, drawers)
To serialize this, we'll iterate through the Storage objects returned by prefetch. The associated drawers will have been pre-populated using the Drawer.storage foreign key's related_name + '_prefetch' (drawers_prefetch):
accum = []
for storage in query:
data = {'name': storage.name, 'description': storage.description}
data['drawers'] = [{'name': drawer.name}
for drawer in storage.drawers_prefetch]
accum.append(data)
To make this even easier you can use the playhouse.shortcuts.model_to_dict helper:
accum = []
for storage in query:
accum.append(model_to_dict(storage, backrefs=True, recurse=True))
The documentation on GeoAlchemy2 doesn't seem fully featured (as compared to the pervious version).
I have a model:
class AddressCode(Base):
__tablename__ = 'address_codes'
id = Column(Integer, primary_key=True)
code = Column(Unicode(34))
geometry = Column(Geometry('POINT'))
And I want to store lat/long data, which I tried to save in the above model, example
"51.42553,-0.666085"
Which gives me the error:
"Parse error at position 9 within Geometry (the "," char")
Anyone able to shed some light on where I am going wrong here?
Also on the subject, how would I peform a query to say..
Show nearest 20 users:
class AddressCode(Base):
__tablename__ = 'address_codes'
id = Column(Integer, primary_key=True)
name = Column(Unicode(34))
geometry = Column(Geometry('POINT'))
Something like?
geom_var = "51.42553,-0.666085"
Session.query(User).filter(func.ST_DWithin, 20, geom_var).all()
In both GeoAlchemy and GeoAlchemy2 you need to specify the geometries in the well-known text format called WKT or Well-known text, or the Well-known binary format. For a point the syntax is 'POINT(X Y)', thus 'POINT(-0.666085 51.42553)' notice that the longitude comes first, then latitude.
The shapely module contains useful functions for handling geometries outside relational databases, along with easy conversions between Python geometry classes and WKT, WKB formats.
Here's how you do it:
this region table is defined as:
regionTable = Table('region', metadata,
Column('region_id', Integer, Sequence('region_region_id_seq'), primary_key=True),
Column('type_cd', String(30)),
Column('region_nm', String(255)),
Column('geo_loc', Geography )
)
how to query it:
(give me all regions within 50 miles of my current location..)
sqlstring = select([regionTable],
func.ST_DWithin(
regionTable.c.geo_loc,
'POINT(-74.78886216922375 40.32829276931833)',
1609*50 ) )
result = connection.execute(sqlstring)
for row in result:
print "region name:", row['region_nm']
I'm trying to do a bulk insert/update with SQLAlchemy. Here's a snippet:
for od in clist:
where = and_(Offer.network_id==od['network_id'],
Offer.external_id==od['external_id'])
o = session.query(Offer).filter(where).first()
if not o:
o = Offer()
o.network_id = od['network_id']
o.external_id = od['external_id']
o.title = od['title']
o.updated = datetime.datetime.now()
payout = od['payout']
countrylist = od['countries']
session.add(o)
session.flush()
for country in countrylist:
c = session.query(Country).filter(Country.name==country).first()
where = and_(OfferPayout.offer_id==o.id,
OfferPayout.country_name==country)
opayout = session.query(OfferPayout).filter(where).first()
if not opayout:
opayout = OfferPayout()
opayout.offer_id = o.id
opayout.payout = od['payout']
if c:
opayout.country_id = c.id
opayout.country_name = country
else:
opayout.country_id = 0
opayout.country_name = country
session.add(opayout)
session.flush()
It looks like my issue was touched on here, http://www.mail-archive.com/sqlalchemy#googlegroups.com/msg05983.html, but I don't know how to use "textual clauses" with session query objects and couldn't find much (though admittedly I haven't had as much time as I'd like to search).
I'm new to SQLAlchemy and I'd imagine there's some issues in the code besides the fact that it throws an exception on a duplicate key. For example, doing a flush after every iteration of clist (but I don't know how else to get an the o.id value that is used in the subsequent OfferPayout inserts).
Guidance on any of these issues is very appreciated.
The way you should be doing these things is with session.merge().
You should also be using your objects relation properties. So the o above should have o.offerpayout and this a list (of objects) and your offerpayout has offerpayout.country property which is the related countries object.
So the above would look something like
for od in clist:
o = Offer()
o.network_id = od['network_id']
o.external_id = od['external_id']
o.title = od['title']
o.updated = datetime.datetime.now()
payout = od['payout']
countrylist = od['countries']
for country in countrylist:
opayout = OfferPayout()
opayout.payout = od['payout']
country_obj = Country()
country_obj.name = country
opayout.country = country_obj
o.offerpayout.append(opayout)
session.merge(o)
session.flush()
This should work as long as all the primary keys are correct (i.e the country table has a primary key of name). Merge essentially checks the primary keys and if they are there merges your object with one in the database (it will also cascade down the joins).