I have an unusual challenge. I'm modifying a table to be able to join with two other legacy groups of PostgreSQL tables.
One group pretty much requires that each record in the table have a unique integer. So, the following field definition would work:
numeric_id = sql.Column(sql.Integer, primary_key=True)
The other group of tables all use UUID fields for the expected JOIN requests. So the following field definition would work:
uu_account_id = sql.Column(UUID(as_uuid=True), primary_key=True)
But, clearly, I can't have two primary keys. So one of them needs to not be a primary key. It would be nice to simply have both still be automatically assigned when a new record is made.
Any suggestions?
I'm sure I can do a quick hack, but I'm curious if there is a nice clean answer.
(And no: changing the other tables is NOT an option. Way too much legacy code.)
Make the uuid column the primary key, like usual.
Define the other column as having serial type and unique. In SQL I'd write
create table mytable (
mytable_id uuid primary key default uuid_generate_v4(),
mytable_legacy_id serial unique not null,
... other cols ...
);
so you just need to do the SQLAlchemy equivalent, whatever that is, of a not null, unique field.
Note that "serial" is just shorthand for
create sequence tablename_colname_seq;
create table tablename (
colname integer default nextval('tablename_colname_seq'),
... cols ...
);
alter sequence tablename_colname_seq owned by tablename.colname;
so if you can't make sqlalchemy recognise that you can have a serial field that isn't a primary key, you can do it this way instead.
Between the SQLAlchemy, alembic (which I also use), and PostgreSQL, this turned out to be tricky.
If creating a brand new table from scratch, the following works for my numeric_id column:
numeric_id = sql.Column(sql.Integer, sql.Sequence('mytable_numeric_id_seq'), unique=True, nullable=False)
(It is possible that the unique=True and nullable=False are overkill.)
However, if modifying an existing table, the sequence itself fails to get created. Or, at least, I couldn't get it to work.
The sequence can be created by hand, of course. Or, if using 'alembic' to make distributed migrations easier, add:
from sqlalchemy.schema import Sequence, CreateSequence
def upgrade():
op.execute(CreateSequence(Sequence("mytable_numeric_id_seq")))
To the version script created by alembic.
Special thanks to Craig for his help.
(Note: most of the SQLAlchemy examples on the net use db. as the module alias rather than sql.. Same thing really. I used sql. simply because I'm using db. already for MongoDB.)
Related
When I create an instance of an object (in my example below, a Company) I want to automagically create default, related objects. One way is to use a per-row, after-insert trigger, but I'm trying to avoid that route and use CTEs which are easier to read and maintain. I have this SQL working (underlying db is PostgreSQL and the only thing you need to know about table company is its primary key is: id SERIAL PRIMARY KEY and it has one other required column, name VARCHAR NOT NULL):
with new_company as (
-- insert my company row, returning the whole row
insert into company (name)
values ('Acme, Inc.')
returning *
),
other_related as (
-- herein I join to `new_company` and create default related rows
-- in other tables. Here we use, effectively, a no-op - what it
-- actually does is not germane to the issue.
select id from new_company
)
-- Having created the related rows, we return the row we inserted into
-- table `company`.
select * from new_company;
The above works like a charm and with the recently added Select.add_cte() (in sqlalchemy 1.4.21) I can write the above with the following python:
import sqlalchemy as sa
from myapp.models import Company
new_company = (
sa.insert(Company)
.values(name='Acme, Inc.')
.returning(Company)
.cte(name='new_company')
)
other_related = (
sa.select(sa.text('new_company.id'))
.select_from(new_company)
.cte('other_related')
)
fetch_company = (
sa.select(sa.text('* from new_company'))
.add_cte(other_related)
)
print(fetch_company)
And the output is:
WITH new_company AS
(INSERT INTO company (name) VALUES (:param_1) RETURNING company.id, company.name),
other_related AS
(SELECT new_company.id FROM new_company)
SELECT * from new_company
Perfect! But when I execute the above query I get back a Row:
>>> result = session.execute(fetch_company).fetchone()
>>> print(result)
(26, 'Acme, Inc.')
I can create an instance with:
>>> result = session.execute(fetch_company).fetchone()
>>> company = Company(**result)
But this instance, if added to the session, is in the wrong state, pending, and if I flush and/or commit, I get a duplicate key error because the company is already in the database.
If I try using Company in the select list, I get a bad query because sqlalchemy automagically sets the from-clause and I cannot figure out how to clear or explicitly set the from-clause to use my CTE.
I'm looking for one of several possible solutions:
annotate an arbitrary query in some way to say, "build an instance of MyModel, but use this table/alias", e.g., query = sa.select(Company).select_from(new_company.alias('company'), reset=True).
tell a session that an instance is persistent regardless of what the session thinks about the instance, e.g., company = Company(**result); session.add(company, force_state='persistent')
Obviously I could do another round-trip to the db with a call to session.merge() (as discussed in early comments of this question) so the instance ends up in the correct state, but that seems terribly inefficient especially if/when used to return lists of instances.
I am trying to select a subset of columns from a table with sqlalchemy's load_only function. Unfortunately it doesn't seem to return only the columns specified in the functional call - specifically, it also seems to fetch the primary key (in my case, an auto_increment id field).
A simple example, if I use this statement to build a query,:
query = session.query(table).options(load_only('col_1', 'col_2'))
Then the query.statement looks like this:
SELECT "table".id, "table"."col_1", "table"."col_2"
FROM "table"
Which is not what I would have expected - given I've specified the "only" columns to use...Where did the id come from - and is there a way to remove it?
Deferring the primary key would not make sense, if querying complete ORM entities, because an entity must have an identity so that a unique row can be identified in the database table. So the query includes the primary key though you have your load_only(). If you want the data only, you should query for that specifically:
session.query(table.col1, table.col2).all()
The results are keyed tuples that you can treat like you would the entities in many cases.
There actually was an issue where having load_only() did remove the primary key from the select list, and it was fixed in 0.9.5:
[orm] [bug] Modified the behavior of orm.load_only() such that primary key columns are always added to the list of columns to be “undeferred”; otherwise, the ORM can’t load the row’s identity. Apparently, one can defer the mapped primary keys and the ORM will fail, that hasn’t been changed. But as load_only is essentially saying “defer all but X”, it’s more critical that PK cols not be part of this deferral.
SQLAlchemy: how should I define a column's default value computed using a reference to the table containing that column?
Let's use these tables as an example (SQLite):
CREATE TABLE department (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL
);
CREATE TABLE employee (
id INTEGER,
name TEXT NOT NULL,
department_id INTEGER NOT NULL,
FOREIGN KEY (department_id) REFERENCES department(id),
PRIMARY KEY (id, department_id)
);
I want each eployee's ID to be unique only with respect to their department. On INSERT, a new employee ID should be generated that is one larger than the previously-highest employee ID in that department.
Put in raw SQL, here's what I'm looking to do:
INSERT INTO employee(
id,
name,
department_id
)
VALUES (
(
SELECT coalesce(MAX(id),0)+1
FROM employee
WHERE department_id=?
),
?,
?
)
What's the best way to do this using SQLAlchemy?
I think I'm looking for something similar to the third column example in here. Something like this:
employee_table = Table("employee", meta,
Column('id', Integer, primary_key=True, autoincrement=False,
default=keyvalues.select(
func.max(employee_table.c.id)
).filter_by(department_id=??))
Column('department_id', Integer, ForeignKey('department.id'),
nullable=False, primary_key=True, autoincrement=False)
Column('name', String(127), nullable=False),
)
That doesn't work, of course: I don't have a reference to the employee table yet (since I'm still defining it) and because I don't know how to reference the "current" department_id in the filter_by clause. (There are quite possibly other problems, too)
Alternatively, if it is not possible to do this through the Python API, is there any way I can just specify a column's default value (applied at INSERT time) using raw SQL? Or do i need to use raw SQL for the entire insert?
Note: my situation is basically the same as in this question, but the solution I'm looking for is different: I want to use a nested SELECT in my inserts rather than create a DB trigger.
EDIT
I'm getting closer to solving the problem, but I'm still not there yet.
agronholm in #sqlalchemy explained that by just using default there would be no way to fill in the department_id because although it's possible to have the selectable used as the default on INSERT, there is no way to fill in parameters (the department_id)
Instead, agronholm suggested the best solution is to create the subquery within the constructor. By assigning the query (not running it and assigning the result!), the id will be fetched in a sub-SELECT. This avoids the race condition that would result from performing the SELECT first on the Python side, and then assigning the result.
I'm trying out something like this:
def __init__(self, department, name):
self.id = db.select(
db.func.max(Employee.id)
).filter_by(department_id=department.id).as_scalar()
self.department = department
self.data = data
Unfortunately, this also doesn't work, because the calculated column is used as part of the primary key. It throws:
InvalidRequestError: Instance <XXXXX at 0x3d15d10> cannot be refreshed - it's not persistent and does not contain a full primary key.
In my original raw-SQLite version, I would access the newly-created row with the cursor's lastrowid. Is something similar possible in SQLAlchemy?
I ran into a similar problem and finally arrived at this solution. There's still room for improvement -- it does the SELECT before the INSERT rather than inlining it -- but it seems to work.
from sqlalchemy import sql
...
def default_employee_id(context):
return context.connection.execute(
sql.select(
[sql.func.ifnull(sql.func.max(employee_table.c.id), 0) + 1]
).where(
employee_table.c.department_id==context.current_parameters['department_id']
)
).scalar()
employee_table = Table("employee", meta,
Column('id', Integer, primary_key=True, autoincrement=False,
default=default_employee_id),
Column('department_id', Integer, ForeignKey('department.id'),
nullable=False, primary_key=True, autoincrement=False),
Column('name', String(127), nullable=False)
)
The next thing I would try is a trigger, even though the docs say it's a bad idea for a primary key.
Hooking into the "before_flush" event would probably have the same pre-select issue.
It may also be possible to alter or replace context.compiled argument in order to inject the SELECT into the INSERT, but that seems extreme for what we're trying to accomplish.
I would like to be able to full text search across several text fields of one of my SQLAlchemy mapped objects. I would also like my mapped object to support foreign keys and transactions.
I plan to use MySQL to run the full text search. However, I understand that MySQL can only run full text search on a MyISAM table, which does not support transactions and foreign keys.
In order to accomplish my objective I plan to create two tables. My code will look something like this:
class User(Base):
__tablename__ = 'users'
id = Column(Integer, primary_key=True)
name = Column(String(50))
description = Column(Text)
users_myisam = Table('users_myisam', Base.metadata,
Column('id', Integer),
Column('name', String(50)),
Column('description', Text),
mysql_engine='MyISAM')
conn = Base.metadata.bind.connect()
conn.execute("CREATE FULLTEXT INDEX idx_users_ftxt \
on users_myisam (name, description)")
Then, to search I will run this:
q = 'monkey'
ft_search = users_myisam.select("MATCH (name,description) AGAINST ('%s')" % q)
result = ft_search.execute()
for row in result: print row
This seems to work, but I have a few questions:
Is my approach of creating two tables to solve my problem reasonable? Is there a standard/better/cleaner way to do this?
Is there a SQLAlchemy way to create the fulltext index, or am I best to just directly execute "CREATE FULLTEXT INDEX ..." as I did above?
Looks like I have a SQL injection problem in my search/match against query. How can I do the select the "SQLAlchemy way" to fix this?
Is there a clean way to join the users_myisam select/match against right back to my user table and return actual User instances, since this is what I really want?
In order to keep my users_myisam table in sync with my mapped object user table, does it make sense for me to use a MapperExtension on my User class, and set the before_insert, before_update, and before_delete methods to update the users_myisam table appropriately, or is there some better way to accomplish this?
Thanks,
Michael
Is my approach of creating two tables to solve my problem reasonable?
Is there a standard/better/cleaner way to do this?
I've not seen this use case attempted before, as developers who value transactions and constraints tend to use Postgresql in the first place. I understand that may not be possible in your specific scenario.
Is there a SQLAlchemy way to create the fulltext index, or am I best
to just directly execute "CREATE FULLTEXT INDEX ..." as I did above?
conn.execute() is fine though if you want something slightly more integrated you can use the DDL() construct, read through http://docs.sqlalchemy.org/en/rel_0_8/core/schema.html?highlight=ddl#customizing-ddl for details
Looks like I have a SQL injection problem in my search/match against query. How can I do the
select the "SQLAlchemy way" to fix this?
note: this recipe is only for MATCH against multiple columns simultaneously - if you have just one column, use the match() operator more simply.
most basically you could use the text() construct:
from sqlalchemy import text, bindparam
users_myisam.select(
text("MATCH (name,description) AGAINST (:value)",
bindparams=[bindparam('value', q)])
)
more comprehensively you could define a custom construct:
from sqlalchemy.ext.compiler import compiles
from sqlalchemy.sql.expression import ClauseElement
from sqlalchemy import literal
class Match(ClauseElement):
def __init__(self, columns, value):
self.columns = columns
self.value = literal(value)
#compiles(Match)
def _match(element, compiler, **kw):
return "MATCH (%s) AGAINST (%s)" % (
", ".join(compiler.process(c, **kw) for c in element.columns),
compiler.process(element.value)
)
my_table.select(Match([my_table.c.a, my_table.c.b], "some value"))
docs:
http://docs.sqlalchemy.org/en/rel_0_8/core/compiler.html
Is there a clean way to join the users_myisam select/match against right back
to my user table and return actual User instances, since this is what I really want?
you should probably create a UserMyISAM class, map it just like User, then use relationship() to link the two classes together, then simple operations like this are possible:
query(User).join(User.search_table).\
filter(Match([UserSearch.x, UserSearch.y], "some value"))
In order to keep my users_myisam table in sync with my mapped object
user table, does it make sense for me to use a MapperExtension on my
User class, and set the before_insert, before_update, and
before_delete methods to update the users_myisam table appropriately,
or is there some better way to accomplish this?
MapperExtensions are deprecated, so you'd at least use the event API, and in most cases we want to try applying object mutations outside of the flush process. In this case, I'd be using the constructor for User, or alternatively the init event, as well as a basic #validates decorator which will receive values for the target attributes on User and copy those values into User.search_table.
Overall, if you've been learning SQLAlchemy from another source (like the Oreilly book), its really out of date by many years, and I'd be focusing on the current online documentation.
I'm working on a Python program that interacts with a simple sqlite database. I'm trying to build a search tool that will be able to, depending on user input, interactively "filter" the database and then return rows (items) that match the search. For example...
My Python program (through if statements, cgi.FieldStorage(), and whatnot) should be able to accept user input and then hunt through the database. Here's the general code for the program:
import cgitb; cgitb.enable()
import cgi
import sys
import sqlite3 as lite
import sys
con = lite.connect('bikes.db')
form = cgi.FieldStorage()
terrain_get = form.getlist("terrain")
terrains = ",".join(terrain_get)
handlebar_get = form.getlist("handlebar")
handlebars = ",".join(handlebar_get)
kickstand = form['kickstand'].value
As you can see, that part is what receives the user's input; works fine (I think). Next, where I need help:
if 'dirtrocky' not in terrains:
FILTER the database to not return items that have "dirtrocky' in their terrain field
And then later in the program, I want to be able to extend on my filter:
if 'drop' not in handlebars:
FILTER the database to, much like in previous one, not return items that have 'drop' in their 'handlebar' field
My question is, HOW can I filter the database? My end result should ideally be a tuple of IDs for rows that are left after I 'filter away' the above.
Thanks!
First, you should define your database schema. Most common approach is to create fully normalized database, something like:
CREATE TABLE bikes (
bike_id INTEGER AUTOINCREMENT PRIMARY KEY,
manufacturer VARCHAR(20),
price FLOAT,
...
);
CREATE TABLE terrains (
terrain_id INTEGER AUTOINCREMENT PRIMARY KEY,
terrain VARCHAR(20),
...
);
CREATE TABLE handlebars (
handlebar_id INTEGER AUTOINCREMENT PRIMARY KEY,
handlebar VARCHAR(20),
...
);
CREATE TABLE bike_terrain (
bike_id INTEGER,
terrain_id INTEGER
);
CREATE TABLE bike_handlebar (
bike_id INTEGER,
handlebar_id INTEGER
);
Note that bikes table does not contain anything about terrain types or handlebars: this info will be stored in connecting tables like bike_terrain.
This fully normalized database makes it little bit cumbersome to populate, but on the other hand, it makes it much easier to query.
How do you query it for multi-valued fields?
You will need to construct your SQL statement dynamically, something like this:
SELECT
b.manufacturer,
b.price
FROM bikes b,
terrains t,
bike_terrain bt
WHERE b.bike_id = bt.bike_id
AND t.terrain_id = bt.terrain_id
AND t.terrain IN ('mountain', 'dirt', ...) -- this will be built dynamically
... -- add more for handlebars, etc...
Almost whole WHERE clause will have to be built and added dynamically, by constructing your SQL statement on the fly.
I highly recommend getting some good SQLite GUI to work on this. On Windows, SQLite Expert Personal is superb, and on Linux sqliteman is great.
Once you get your database populated and it has something beyond few 100s of rows, you should add proper indexes so it works fast. Good luck!