Proper use of MySQL full text search with SQLAlchemy - python

I would like to be able to full text search across several text fields of one of my SQLAlchemy mapped objects. I would also like my mapped object to support foreign keys and transactions.
I plan to use MySQL to run the full text search. However, I understand that MySQL can only run full text search on a MyISAM table, which does not support transactions and foreign keys.
In order to accomplish my objective I plan to create two tables. My code will look something like this:
class User(Base):
__tablename__ = 'users'
id = Column(Integer, primary_key=True)
name = Column(String(50))
description = Column(Text)
users_myisam = Table('users_myisam', Base.metadata,
Column('id', Integer),
Column('name', String(50)),
Column('description', Text),
mysql_engine='MyISAM')
conn = Base.metadata.bind.connect()
conn.execute("CREATE FULLTEXT INDEX idx_users_ftxt \
on users_myisam (name, description)")
Then, to search I will run this:
q = 'monkey'
ft_search = users_myisam.select("MATCH (name,description) AGAINST ('%s')" % q)
result = ft_search.execute()
for row in result: print row
This seems to work, but I have a few questions:
Is my approach of creating two tables to solve my problem reasonable? Is there a standard/better/cleaner way to do this?
Is there a SQLAlchemy way to create the fulltext index, or am I best to just directly execute "CREATE FULLTEXT INDEX ..." as I did above?
Looks like I have a SQL injection problem in my search/match against query. How can I do the select the "SQLAlchemy way" to fix this?
Is there a clean way to join the users_myisam select/match against right back to my user table and return actual User instances, since this is what I really want?
In order to keep my users_myisam table in sync with my mapped object user table, does it make sense for me to use a MapperExtension on my User class, and set the before_insert, before_update, and before_delete methods to update the users_myisam table appropriately, or is there some better way to accomplish this?
Thanks,
Michael

Is my approach of creating two tables to solve my problem reasonable?
Is there a standard/better/cleaner way to do this?
I've not seen this use case attempted before, as developers who value transactions and constraints tend to use Postgresql in the first place. I understand that may not be possible in your specific scenario.
Is there a SQLAlchemy way to create the fulltext index, or am I best
to just directly execute "CREATE FULLTEXT INDEX ..." as I did above?
conn.execute() is fine though if you want something slightly more integrated you can use the DDL() construct, read through http://docs.sqlalchemy.org/en/rel_0_8/core/schema.html?highlight=ddl#customizing-ddl for details
Looks like I have a SQL injection problem in my search/match against query. How can I do the
select the "SQLAlchemy way" to fix this?
note: this recipe is only for MATCH against multiple columns simultaneously - if you have just one column, use the match() operator more simply.
most basically you could use the text() construct:
from sqlalchemy import text, bindparam
users_myisam.select(
text("MATCH (name,description) AGAINST (:value)",
bindparams=[bindparam('value', q)])
)
more comprehensively you could define a custom construct:
from sqlalchemy.ext.compiler import compiles
from sqlalchemy.sql.expression import ClauseElement
from sqlalchemy import literal
class Match(ClauseElement):
def __init__(self, columns, value):
self.columns = columns
self.value = literal(value)
#compiles(Match)
def _match(element, compiler, **kw):
return "MATCH (%s) AGAINST (%s)" % (
", ".join(compiler.process(c, **kw) for c in element.columns),
compiler.process(element.value)
)
my_table.select(Match([my_table.c.a, my_table.c.b], "some value"))
docs:
http://docs.sqlalchemy.org/en/rel_0_8/core/compiler.html
Is there a clean way to join the users_myisam select/match against right back
to my user table and return actual User instances, since this is what I really want?
you should probably create a UserMyISAM class, map it just like User, then use relationship() to link the two classes together, then simple operations like this are possible:
query(User).join(User.search_table).\
filter(Match([UserSearch.x, UserSearch.y], "some value"))
In order to keep my users_myisam table in sync with my mapped object
user table, does it make sense for me to use a MapperExtension on my
User class, and set the before_insert, before_update, and
before_delete methods to update the users_myisam table appropriately,
or is there some better way to accomplish this?
MapperExtensions are deprecated, so you'd at least use the event API, and in most cases we want to try applying object mutations outside of the flush process. In this case, I'd be using the constructor for User, or alternatively the init event, as well as a basic #validates decorator which will receive values for the target attributes on User and copy those values into User.search_table.
Overall, if you've been learning SQLAlchemy from another source (like the Oreilly book), its really out of date by many years, and I'd be focusing on the current online documentation.

Related

Bulk insert using sqlalchemy Engine

Is there a way to bulk-insert/update values into a Microsoft SQLserver Database using Engine?
I have read several (very) old posts regarding this, and it seems not very easy to do (back then).
E.g in some examples we need to create a class, add those classes to a session and at last commit the session.
Isn't there a way like (pseudo) this:
from sqlalchemy import String, Integer, Float
values= [(1,"hello",2.5),(2,"world",10.5)] #values to insert
table = "my_schema.my_table" #Table name
col = ["id","statement","ratio"] #Name of the columns in the database
type = [Integer,String,Float] #Type of each value
engine = sqlalchemy.create_engine(connection_string)
with engine.session():
try:
engine.bulk_insert(table,values,col,type)
except:
engine.rollback()
or something else, instead of looping over engine.execute("INSERT INTO ...")?
I know I can use pandas.DataFrame.to_sql but since I want to be able to roll-back in case of errors etc. I won't use that

How do I handle database columns with reserved characters in SQLAlchemy ORM?

I'm somewhat new to SQLAlchemy ORM, and I'm trying to select and then store data from a column within a view that has a forward slash in the name of the column.
The databases are mapped using the following:
source_engine = create_engine("...")
base = automap_base()
base.prepare(source_engine, reflect=True)
metadata = MetaData(self.engine)
table_1 = Table("table_1", self.metadata, autoload=True)
The second destination table is mapped the same way.
Then, I connect to this database, and I'm trying to select information from columns to copy into a different database:
source_table_session = Session(source_engine)
dest_table_session = Session(dest_engine)
table_1_data = table_1_session.query(table_1)
for instance in table_1_data:
newrow = dest_table.base.classes.dest_table()
newrow.Column1 = instance.Column1 # This works fine, column has normal name
But then, the problem is that there's a column in the view with the name "Slot/Port"
With a direct query, you can do:
select "Slot/Port" from source_database;
But in ORM, you can't just type:
newrow.Slot/Port = instance.Slot/Port
or
newrow.'Slot/Port' = instance.'Slot/Port'
That isn't going to be correct, and the following doesn't work either:
newrow.SlotPort = instance.SlotPort
AttributeError: 'result' object has no attribute 'SlotPort'
I have no control over how columns are named in the source database.
I find the SQLAlchemy documentation to be generally fragmented (only showing small snippets of code) and confusing, so I'm not sure if this is kind of thing is addressed or not. Is there a way to get around this limitation, or perhaps if the columns are already mapped to a valid name without a slash or a way to do so?
Thanks to #DeepSpace for helping me find the answer.
Instead of
newrow.whatever = instance.whatever
I needed:
setattr(newrow, 'Slot/Port', getattr(instance, 'Slot/Port'))

select in (select ..) using ORM django

How can I make a query
select name where id in (select id from ...)
using Django ORM? I think I can make this using some loop for for obtain some result and another loop for, for use this result, but I think that is not practical job, is more simple make a query sql, I think that make this in python should be more simple in python
I have these models:
class Invoice (models.Model):
factura_id = models.IntegerField(unique=True)
created_date = models.DateTimeField()
store_id = models.ForeignKey(Store,blank=False)
class invoicePayments(models.Model):
invoice = models.ForeignKey(Factura)
date = models.DateTimeField()#auto_now = True)
money = models.DecimalField(max_digits=9,decimal_places=0)
I need get the payments of a invoice filter by store_id,date of pay.
I make this query in mysql using a select in (select ...). This a simple query but make some similar using django orm i only think and make some loop for but I don't like this idea:
invoiceXstore = invoice.objects.filter(local=3)
for a in invoiceXstore:
payments = invoicePayments.objects.filter(invoice=a.id,
date__range=["2016-05-01", "2016-05-06"])
You can traverse ForeignKey relations using double underscores (__) in Django ORM. For example, your query could be implemented as:
payments = invoicePayments.objects.filter(invoice__store_id=3,
date__range=["2016-05-01", "2016-05-06"])
I guess you renamed your classes to English before posting here. In this case, you may need to change the first part to factura__local=3.
As a side note, it is recommended to rename your model class to InvoicePayments (with a capital I) to be more compliant with PEP8.
Your mysql raw query is a sub query.
select name where id in (select id from ...)
In mysql this will usually be slower than an INNER JOIN (refer : [http://dev.mysql.com/doc/refman/5.7/en/rewriting-subqueries.html]) thus you can rewrite your raw query as an INNER JOIN which will look like 1.
SELECT ip.* FROM invoicepayments i INNER JOIN invoice i ON
ip.invoice_id = i.id
You can then use a WHERE clause to apply the filtering.
The looping query approach you have tried does work but it is not recommended because it results in a large number of queries being executed. Instead you can do.
InvoicePayments.objects.filter(invoice__local=3,
date__range=("2016-05-01", "2016-05-06"))
I am not quite sure what 'local' stands for because your model does not show any field like that. Please update your model with the correct field or edit the query as appropriate.
To lean about __range see this https://docs.djangoproject.com/en/1.9/ref/models/querysets/#range

Having two serial keys in postgresql via sqlalchemy

I have an unusual challenge. I'm modifying a table to be able to join with two other legacy groups of PostgreSQL tables.
One group pretty much requires that each record in the table have a unique integer. So, the following field definition would work:
numeric_id = sql.Column(sql.Integer, primary_key=True)
The other group of tables all use UUID fields for the expected JOIN requests. So the following field definition would work:
uu_account_id = sql.Column(UUID(as_uuid=True), primary_key=True)
But, clearly, I can't have two primary keys. So one of them needs to not be a primary key. It would be nice to simply have both still be automatically assigned when a new record is made.
Any suggestions?
I'm sure I can do a quick hack, but I'm curious if there is a nice clean answer.
(And no: changing the other tables is NOT an option. Way too much legacy code.)
Make the uuid column the primary key, like usual.
Define the other column as having serial type and unique. In SQL I'd write
create table mytable (
mytable_id uuid primary key default uuid_generate_v4(),
mytable_legacy_id serial unique not null,
... other cols ...
);
so you just need to do the SQLAlchemy equivalent, whatever that is, of a not null, unique field.
Note that "serial" is just shorthand for
create sequence tablename_colname_seq;
create table tablename (
colname integer default nextval('tablename_colname_seq'),
... cols ...
);
alter sequence tablename_colname_seq owned by tablename.colname;
so if you can't make sqlalchemy recognise that you can have a serial field that isn't a primary key, you can do it this way instead.
Between the SQLAlchemy, alembic (which I also use), and PostgreSQL, this turned out to be tricky.
If creating a brand new table from scratch, the following works for my numeric_id column:
numeric_id = sql.Column(sql.Integer, sql.Sequence('mytable_numeric_id_seq'), unique=True, nullable=False)
(It is possible that the unique=True and nullable=False are overkill.)
However, if modifying an existing table, the sequence itself fails to get created. Or, at least, I couldn't get it to work.
The sequence can be created by hand, of course. Or, if using 'alembic' to make distributed migrations easier, add:
from sqlalchemy.schema import Sequence, CreateSequence
def upgrade():
op.execute(CreateSequence(Sequence("mytable_numeric_id_seq")))
To the version script created by alembic.
Special thanks to Craig for his help.
(Note: most of the SQLAlchemy examples on the net use db. as the module alias rather than sql.. Same thing really. I used sql. simply because I'm using db. already for MongoDB.)

Filter SQLAlchemy one-to-many with "does not contain"

I'm attempting to query and filter on a one-to-many relationship and cannot seem to figure out how to do this.
Here are my mappings (trimmed for brevity):
class Bug(Base):
__tablename__ = 'bug'
id = Column('bug_id', Integer, primary_key=True)
tags = relationship('Tag', backref='bug')
class Tag(Base):
id = Column('tag_id', Integer, primary_key=True)
name = Column('tag_name', String)
bug_id = Column('bug_id', ForeignKey('bug.bug_id'))
I want to be able to find all bugs that do not have tag with name "foo".
You can use the any() operator on the relationship.
bugs_without_foo = session.query(Bug).filter(
db.not_(Bug.tags.any(Tag.name == 'foo'))
).all()
It's nicer to look at, but it could be less efficient over very large data sets than the subquery from Dan Lenski's answer.
I am not sure exactly what the Tag table is supposed to represent for you, but it is odd that your schema associates each Tag with exactly one Bug. If you want to tag multiple Bugs with a tag of the same name, you will be creating multiple rows in the Tag class with the same name. This would seem to violate the 3rd normal form.
The standard way to describe a tag cloud in a database would be to use a many-to-many relationship with a secondary "association" table that associates (bug,tag) pairs. The SQLAlchemy docs have a very nice tutorial on this pattern.
If you stick with your schema as-is, there are several ways to do it.
Client-side filtering
This is obviously inefficient but it is easy to understand. You go through the bugs one by one, go through their tags one by one, and eliminate the bugs where tag.name=="foo":
non_foo_bugs = [ bug for bug in session.query(Bug)
if not any(tag.name=="foo" for tag in bug.tag) ]
Two queries
Find all distinct bugs that are tagged "foo", and then find the complement of that set.
This version uses exactly two queries of the database:
foo_bugs = [t.bug_id for t in session.query(Tag).filter_by(name="foo").distinct()]
session.query(Bug).filter(~Bug.id.in_(foo_bugs))
One query with a subquery
Same as the above, but make foo_bugs a subquery, since there's no reason to fetch its contents on the client side:
foo_bugs = session.query(Tag.bug_id).filter_by(name="foo").distinct().subquery()
session.query(Bug).filter(~Bug.id.in_(foo_bugs))
This would be an uncorrelated subquery, so from the server point of view it should be optimized just about the same as two separate queries.

Categories