Need help with joins in sqlalchemy - python

I'm new to Python, as well as SQL Alchemy, but not the underlying development and database concepts. I know what I want to do and how I'd do it manually, but I'm trying to learn how an ORM works.
I have two tables, Images and Keywords. The Images table contains an id column that is its primary key, as well as some other metadata. The Keywords table contains only an id column (foreign key to Images) and a keyword column. I'm trying to properly declare this relationship using the declarative syntax, which I think I've done correctly.
Base = declarative_base()
class Keyword(Base):
__tablename__ = 'Keywords'
__table_args__ = {'mysql_engine' : 'InnoDB'}
id = Column(Integer, ForeignKey('Images.id', ondelete='CASCADE'),
primary_key=True)
keyword = Column(String(32), primary_key=True)
class Image(Base):
__tablename__ = 'Images'
__table_args__ = {'mysql_engine' : 'InnoDB'}
id = Column(Integer, primary_key=True, autoincrement=True)
name = Column(String(256), nullable=False)
keywords = relationship(Keyword, backref='image')
This represents a many-to-many relationship. One image can have many keywords, and one keyword can relate back to many images.
I want to do a keyword search of my images. I've tried the following with no luck.
Conceptually this would've been nice, but I understand why it doesn't work.
image = session.query(Image).filter(Image.keywords.contains('boy'))
I keep getting errors about no foreign key relationship, which seems clearly defined to me. I saw something about making sure I get the right 'join', and I'm using 'from sqlalchemy.orm import join', but still no luck.
image = session.query(Image).select_from(join(Image, Keyword)).\
filter(Keyword.keyword == 'boy')
I added the specific join clause to the query to help it along, though as I understand it, I shouldn't have to do this.
image = session.query(Image).select_from(join(Image, Keyword,
Image.id==Keyword.id)).filter(Keyword.keyword == 'boy')
So finally I switched tactics and tried querying the keywords and then using the backreference. However, when I try to use the '.images' iterating over the result, I get an error that the 'image' property doesn't exist, even though I did declare it as a backref.
result = session.query(Keyword).filter(Keyword.keyword == 'boy').all()
I want to be able to query a unique set of image matches on a set of keywords. I just can't guess my way to the syntax, and I've spent days reading the SQL Alchemy documentation trying to piece this out myself.
I would very much appreciate anyone who can point out what I'm missing.

It appears that I was still getting the wrong version of join, even importing the one under sqlalchemy.orm. I did this to resolve the problem:
from sqlalchemy.orm.util import join as join_
image = session.query(Image).select_from(join_(Image, Keyword)).\
filter(Keyword.keyword == 'boy')
Is that really the "most right" solution, or am I missing some nuance of Python? Since I'm still learning, I'd like to do things the "most right" way as advised by those with more experience. Thanks.

Related

Array type in SQlite

I'm in the middle of developing a small site in Python. I use flask and venv.
I am currently in the middle of writing the data base and here is one of my tables:
class Message(db.Model):
message_id = db.Column(db.Integer, primary_key=True)
session_id = db.Column(db.String(30), unique=True)
application_id = db.Column(db.Integer)
participants = db.Column(db.Array())
content = db.Column(db.String(200))
The problem is in line 5:
"Array".
There is no such variable type.
I want to create a list of message recipients. Is there an Array or List variable type in SQlite?
If so, what is and how is it used?
And if not, how can I make a list of recipients anyway?
Anyone know?
Thank you very much!
SQLite does not support arrays directly. It only does Ints, Floats and Text. See here the type it supports.
To accomplish what you need, you have to use a custom encoding, or use an FK, i.e. create another table, where each item in the array is stored as a row. This would get tedious in my opinion.
Alternatively, it can be done in SQLAlchemy and you will want to have a look at the PickleType:
array = db.Column(db.PickleType(mutable=True))
Please note that you will have to use the mutable=True parameter to be able to edit the column. SQLAlchemy will detect changes automatically and they will be saved as soon as you commit them.
Also, have a look at the ScalarListType in SQLAlchemy for saving multiple values in column.
Update:
In SqlAlchemy You can use array column.
For example:
class Example(db.Model):
id = db.Column(db.Integer, primary_key=True)
my_array = db.Column(db.ARRAY(db.Integer())
# You can easily find records:
# Example.my_array.contains([1, 2, 3]).all()
# You can use text items of array
# db.Column(db.ARRAY(db.Text())
Update: This doesn't work in SQLite, SQLAlchemy's ARRAY type is for Postgres databases only. The best alternative for most people would be something involving JSON or switching to Postgres if possible. I'll be attempting JSON myself. credit to the replier in the comments.

Filter SQLAlchemy one-to-many with "does not contain"

I'm attempting to query and filter on a one-to-many relationship and cannot seem to figure out how to do this.
Here are my mappings (trimmed for brevity):
class Bug(Base):
__tablename__ = 'bug'
id = Column('bug_id', Integer, primary_key=True)
tags = relationship('Tag', backref='bug')
class Tag(Base):
id = Column('tag_id', Integer, primary_key=True)
name = Column('tag_name', String)
bug_id = Column('bug_id', ForeignKey('bug.bug_id'))
I want to be able to find all bugs that do not have tag with name "foo".
You can use the any() operator on the relationship.
bugs_without_foo = session.query(Bug).filter(
db.not_(Bug.tags.any(Tag.name == 'foo'))
).all()
It's nicer to look at, but it could be less efficient over very large data sets than the subquery from Dan Lenski's answer.
I am not sure exactly what the Tag table is supposed to represent for you, but it is odd that your schema associates each Tag with exactly one Bug. If you want to tag multiple Bugs with a tag of the same name, you will be creating multiple rows in the Tag class with the same name. This would seem to violate the 3rd normal form.
The standard way to describe a tag cloud in a database would be to use a many-to-many relationship with a secondary "association" table that associates (bug,tag) pairs. The SQLAlchemy docs have a very nice tutorial on this pattern.
If you stick with your schema as-is, there are several ways to do it.
Client-side filtering
This is obviously inefficient but it is easy to understand. You go through the bugs one by one, go through their tags one by one, and eliminate the bugs where tag.name=="foo":
non_foo_bugs = [ bug for bug in session.query(Bug)
if not any(tag.name=="foo" for tag in bug.tag) ]
Two queries
Find all distinct bugs that are tagged "foo", and then find the complement of that set.
This version uses exactly two queries of the database:
foo_bugs = [t.bug_id for t in session.query(Tag).filter_by(name="foo").distinct()]
session.query(Bug).filter(~Bug.id.in_(foo_bugs))
One query with a subquery
Same as the above, but make foo_bugs a subquery, since there's no reason to fetch its contents on the client side:
foo_bugs = session.query(Tag.bug_id).filter_by(name="foo").distinct().subquery()
session.query(Bug).filter(~Bug.id.in_(foo_bugs))
This would be an uncorrelated subquery, so from the server point of view it should be optimized just about the same as two separate queries.

SQLAlchemy, using the same model with multiple tables

I have data for a particular entity partitioned across multiple identical tables, often separated chronologically or by numeric range. For instance, I may have a table called mytable for current data, a mytable_2013 for last year's data, mytable_2012, and so on.
Only the current table is ever written to. The others are only consulted. With SQLAlchemy, is there any way I can specify the table to query from when using the declarative model?
Use mixins and change table names by an object property.
class Node(Base):
__tablename__ = 'node'
nid = Column(Integer, primary_key=True)
uuid = Column(String(128))
vid = Column(Integer)
class Node1(Node):
__tablename__ = 'node_1'
class Node2(Node):
__tablename__ = 'node_2'
As requested, re-posting as answer:
Please take a look at this answer to Mapping lots of similar tables in SQLAlchemy in the Concrete Table Inheritance section.
In your case you can query MyTable only when working with the current data, and do a polymorphic search on all tables when you need the whole history.

Proper use of MySQL full text search with SQLAlchemy

I would like to be able to full text search across several text fields of one of my SQLAlchemy mapped objects. I would also like my mapped object to support foreign keys and transactions.
I plan to use MySQL to run the full text search. However, I understand that MySQL can only run full text search on a MyISAM table, which does not support transactions and foreign keys.
In order to accomplish my objective I plan to create two tables. My code will look something like this:
class User(Base):
__tablename__ = 'users'
id = Column(Integer, primary_key=True)
name = Column(String(50))
description = Column(Text)
users_myisam = Table('users_myisam', Base.metadata,
Column('id', Integer),
Column('name', String(50)),
Column('description', Text),
mysql_engine='MyISAM')
conn = Base.metadata.bind.connect()
conn.execute("CREATE FULLTEXT INDEX idx_users_ftxt \
on users_myisam (name, description)")
Then, to search I will run this:
q = 'monkey'
ft_search = users_myisam.select("MATCH (name,description) AGAINST ('%s')" % q)
result = ft_search.execute()
for row in result: print row
This seems to work, but I have a few questions:
Is my approach of creating two tables to solve my problem reasonable? Is there a standard/better/cleaner way to do this?
Is there a SQLAlchemy way to create the fulltext index, or am I best to just directly execute "CREATE FULLTEXT INDEX ..." as I did above?
Looks like I have a SQL injection problem in my search/match against query. How can I do the select the "SQLAlchemy way" to fix this?
Is there a clean way to join the users_myisam select/match against right back to my user table and return actual User instances, since this is what I really want?
In order to keep my users_myisam table in sync with my mapped object user table, does it make sense for me to use a MapperExtension on my User class, and set the before_insert, before_update, and before_delete methods to update the users_myisam table appropriately, or is there some better way to accomplish this?
Thanks,
Michael
Is my approach of creating two tables to solve my problem reasonable?
Is there a standard/better/cleaner way to do this?
I've not seen this use case attempted before, as developers who value transactions and constraints tend to use Postgresql in the first place. I understand that may not be possible in your specific scenario.
Is there a SQLAlchemy way to create the fulltext index, or am I best
to just directly execute "CREATE FULLTEXT INDEX ..." as I did above?
conn.execute() is fine though if you want something slightly more integrated you can use the DDL() construct, read through http://docs.sqlalchemy.org/en/rel_0_8/core/schema.html?highlight=ddl#customizing-ddl for details
Looks like I have a SQL injection problem in my search/match against query. How can I do the
select the "SQLAlchemy way" to fix this?
note: this recipe is only for MATCH against multiple columns simultaneously - if you have just one column, use the match() operator more simply.
most basically you could use the text() construct:
from sqlalchemy import text, bindparam
users_myisam.select(
text("MATCH (name,description) AGAINST (:value)",
bindparams=[bindparam('value', q)])
)
more comprehensively you could define a custom construct:
from sqlalchemy.ext.compiler import compiles
from sqlalchemy.sql.expression import ClauseElement
from sqlalchemy import literal
class Match(ClauseElement):
def __init__(self, columns, value):
self.columns = columns
self.value = literal(value)
#compiles(Match)
def _match(element, compiler, **kw):
return "MATCH (%s) AGAINST (%s)" % (
", ".join(compiler.process(c, **kw) for c in element.columns),
compiler.process(element.value)
)
my_table.select(Match([my_table.c.a, my_table.c.b], "some value"))
docs:
http://docs.sqlalchemy.org/en/rel_0_8/core/compiler.html
Is there a clean way to join the users_myisam select/match against right back
to my user table and return actual User instances, since this is what I really want?
you should probably create a UserMyISAM class, map it just like User, then use relationship() to link the two classes together, then simple operations like this are possible:
query(User).join(User.search_table).\
filter(Match([UserSearch.x, UserSearch.y], "some value"))
In order to keep my users_myisam table in sync with my mapped object
user table, does it make sense for me to use a MapperExtension on my
User class, and set the before_insert, before_update, and
before_delete methods to update the users_myisam table appropriately,
or is there some better way to accomplish this?
MapperExtensions are deprecated, so you'd at least use the event API, and in most cases we want to try applying object mutations outside of the flush process. In this case, I'd be using the constructor for User, or alternatively the init event, as well as a basic #validates decorator which will receive values for the target attributes on User and copy those values into User.search_table.
Overall, if you've been learning SQLAlchemy from another source (like the Oreilly book), its really out of date by many years, and I'd be focusing on the current online documentation.

Why is SQLAlchemy/associationproxy duplicating my tags?

I'm trying to use association proxy for tags, in a very similar scenario to the example in the docs. Here is a subset of my schema (it's a blog), using declarative:
class Tag(Base):
__tablename__ = 'tags'
id = Column(Integer, primary_key=True)
tag = Column(Unicode(255), unique=True, nullable=False)
class EntryTag(Base):
__tablename__ = 'entrytags'
entry_id = Column(Integer, ForeignKey('entries.id'), key='entry', primary_key=True)
tag_id = Column(Integer, ForeignKey('tags.id'), key='tag', primary_key=True)
class Entry(Base):
__tablename__ = 'entries'
id = Column(Integer, primary_key=True)
subject = Column(Unicode(255), nullable=False)
# some other fields here
_tags = relation('Tag', backref='entries', secondary=EntryTag.__table__)
tags = association_proxy('_tags','tag')
Here's how I'm trying to use it:
>>> e = db.query(Entry).first()
>>> e.tags
[u'foo']
>>> e.tags = [u'foo', u'bar'] # really this is from a comma-separated input
db.commit()
Traceback (most recent call last):
[...]
sqlalchemy.exc.IntegrityError: (IntegrityError) duplicate key value violates unique constraint "tags_tag_key"
'INSERT INTO tags (id, tag) VALUES (%(id)s, %(tag)s)' {'tag': 'bar', 'id': 11L}
>>> map(lambda t:(t.id,t.tag), db.query(Tag).all())
[(1, u'foo'), (2, u'bar'), (3, u'baz')]
The tag u'bar' already existed with id 2; why didn't SQLAlchemy just attach that one instead of trying to create it? Is my schema wrong somehow?
Disclaimer: it's been ages since I used SQLAlchemy so this is more of a guess than anything.
It looks like you're expecting SQLAlchemy to magically take the string 'bar' and look up the relevant Tag for it when performing the insert on the many-to-many table. I expect this is invalid, because the field in question ('tag') is not a primary key.
Imagine a similar situation where your Tag table is actually Comment, also with an id and a text field. You'd expect to be able to add Comments to an Entry with the same e.comments = ['u'Foo', 'u'Bar'] syntax that you've used above, but you'd want it to just perform INSERTs, not check for existing comments with the same content.
So that is probably what it's doing here, but it hits the uniqueness constraint on your tag name and fails, assuming that you're attempting to do the wrong thing.
How to fix it? Making tags.tag the primary key is arguably the correct thing to do, although I don't know how efficient that is nor how well SQLAlchemy handles it. Failing that, try querying for Tag objects by name before assigning them to the entry. You may have to write a little utility function that takes a unicode string and either returns an existing Tag or creates a new one for you.
I've never used SQLAlchemy 0.5 yet (my last app using it was 0.4 based) but I can see one quirk in your code: you should modify the association_proxy object, not reassign it.
Try doing something like:
e.tags.append(u"bar")
Instead of
e.tags = ...
If that doesn't work, try pasting a complete working example for those tables (including the imports, please!) and I'll give you some more advice.

Categories