We're converting a codebase to SqlAlchemy, where there's an existing database that we can modify but not completely replace.
There's a collection of widgets, and for each widget we keep track of the 20 most similar other widgets (this is a directional relationship, i.e. widget_2 can appear in widget_1's most similar widgets, but not vice versa).
There's a widget table which has a widget_id field and some other things.
There's a similarity table which has first_widget_id, second_widget_id and similarity_score. We only save the 20 most similar widgets in the database, so that every widget_id appears exactly 20 times as first_widget_id.
first_widget_id and second_widget_id have foreign keys pointing to the widget table.
We're using SQLAlchemy's automap functionality, so a Widget object has a Widget.similarity_collection field. However, for a specified widget_id, it only includes items where second_widget_id == widget_id, whereas we want first_widget_id == widget_id. I understand that SQLAlchemy has no way to know which of the 2 it should pick.
Can we tell it somehow?
EDIT: as per the comment, here are more details on the models:
CREATE TABLE IF NOT EXISTS `similarity` (
`first_widget_id` int(6) NOT NULL,
`second_widget_id` int(6) NOT NULL,
`score` int(5) NOT NULL,
PRIMARY KEY (`first_widget_id`,`second_widget_id`),
KEY `first_widget_id` (`first_widget_id`),
KEY `second_widget_id_index` (`second_widget_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
ALTER TABLE `similarity`
ADD CONSTRAINT `similar_first_widget_id_to_widgets_foreign_key` FOREIGN KEY (`first_widget_id`) REFERENCES `widgets` (`widget_id`) ON DELETE CASCADE ON UPDATE CASCADE,
ADD CONSTRAINT `similar_second_widget_id_to_widgets_foreign_key` FOREIGN KEY (`second_widget_id`) REFERENCES `widgets` (`widget_id`) ON DELETE CASCADE ON UPDATE CASCADE;
CREATE TABLE IF NOT EXISTS `widgets` (
`widget_id` int(6) NOT NULL AUTO_INCREMENT,
`widget_name` varchar(70) NOT NULL,
PRIMARY KEY (`game_id`),
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=13179 ;
And using this python code to initialize SQLAlchemy:
base = automap_base()
engine = create_engine(
'mysql://%s:%s#%s/%s?charset=utf8mb4' % (
config.DB_USER, config.DB_PASSWD, config.DB_HOST, config.DB_NAME
), echo=False
)
# reflect the tables
base.prepare(self.engine, reflect=True)
Widgets = base.classes.widgets
Now when we do something like:
session.query(Widgets).filter_by(widget_id=1).similarity_collection
We get sqlalchemy.ext.automap.similar objects for which second_widget_id == 1, whereas we want first_widget_id == 1
You can override how the similarity_collection joins, even when automapping, with an explicit class definition and passing foreign_keys to the relationship:
base = automap_base()
engine = create_engine(
'mysql://%s:%s#%s/%s?charset=utf8mb4' % (
config.DB_USER, config.DB_PASSWD, config.DB_HOST, config.DB_NAME
), echo=False
)
# The class definition that ensures certain join path for the relationship.
# Rest of the mapping is automapped upon reflecting.
class Widgets(base):
__tablename__ = 'widgets'
similarity_collection = relationship(
'similarity', foreign_keys='similarity.first_widget_id')
base.prepare(self.engine, reflect=True)
If you wish to also control the created relationship(s) in similarity – for neat association proxies or such – use the same pattern.
Related
I'm attempting to check if the key of a particular entity (as defined with an ORM class) already exists in a database.
Note that it needs to work regardless of the names of the (key) attributes. It needs to work generically across all ORM models.
Here's the basic structure of what I need:
# User class definition using ORM
class Employee(Base):
__tablename__ = 'User'
USER_ID = Column(String(50), primary_key=True)
... # OTHER FIELDS EXIST HERE, NOT PART OF PRIMARY KEY
# We have a particular ORM instance
# We need to check if its PK exists in table
# Other fields may have any values
user_instance = User(USER_ID = 1)
# We have a session with DB
session = Session()
# ===
# Now, how do we check if User with USER_ID = 1 (dynamically - we might have different data)
# ===
I am aware of sqlalchemy.inspect but I'm not really sure how it would be applied in this case.
Any help is greatly appreciated.
I need to create a many to many relationship between two tables. I need to specify the foreign_keys option because I have other references in both tables to each other.
Tried multiple approaches using Class declaration of the assiciation table and Table object directly.
When I remove the foreign_keys option on both User andFeature` class, it works but when I add other fields with mappings between these two classes, I get the Multiple paths exception.
feature_user = Table(
'feature_user',
Base.metadata,
Column('feature_id', Integer, ForeignKey('features.id')),
Column('user_id', Integer, ForeignKey('users.id')),
)
class Feature(Base):
__tablename__ = 'features'
id = getIdColumn()
# other fields...
cpm_engineers = relationship(
'User',
secondary=feature_user,
foreign_keys=[feature_user.columns.user_id],
back_populates='cpm_engineer_of',
)
class User(Base):
__tablename__ = 'users'
id = getIdColumn()
# other fields...
cpm_engineer_of = relationship(
'Feature',
secondary=feature_user,
foreign_keys=[feature_user.columns.feature_id],
back_populates='cpm_engineers',
)
When creating a new User of a Feature, I get the following error:
sqlalchemy.exc.NoForeignKeysError: Could not determine join condition between parent/child tables on relationship Feature.cpm_engineers - there are no foreign keys linking these tables via secondary table 'feature_user'. Ensure that referencing columns are associated with a ForeignKey or ForeignKeyConstraint, or specify 'primaryjoin' and 'secondaryjoin' expressions.
I believe the issue was I was trying to use the same table to store many-to-many associations for multiple bindings, e.g. feature-tester-user, feature-developer-user, etc. This is, of course, wrong. I needed to add a custom JOIN using a third column (e.g. add a new role column) or create a separate table for each association (which is what i did in the end).
I have a table, MenuOptions which represents any option found in a dropdown in my app. Each option can be identified by the menu it is part of (e.g. MenuOptions.menu_name) and the specific value of that option (MenuOptions.option_value).
This table has relationships all across my db and doesn't use foreign keys, so I'm having trouble getting it to mesh with SQLAlchemy.
In SQL it would be as easy as:
SELECT
*
FROM
document
JOIN
menu_options ON menu_options.option_menu_name = 'document_type'
AND menu_options.option_value = document.document_type_id
to define this relationship. However I'm running into trouble when doing this in SQLAlchemy because I can't map this relationship cleanly without foreign keys. In SQLAlchemy the best I've done so far is:
the_doc = db.session.query(Document, MenuOptions).filter(
Document.id == document_id
).join(
MenuOptions,
and_(
MenuOptions.menu_name == text('"document_type"'),
MenuOptions.value == Document.type_id
)
).first()
Which does work, and does return the correct values, but returns them as a list of two separate model objects such that I have to reference the mapped Document properties via the_doc[0] and the mapped MenuOptions properties via the_doc[1]
Is there a way I can get this relationship returned as a single query object with all the properties on it without using foreign keys or any ForeignKeyConstraint in my model? I've tried add_columns and add_entity but I get essentially the same result.
you can use with_entities
entities = [getattr(Document, c) for c in Document.__table__.columns.keys()] + \
[getattr(MenuOptions, c) for c in MenuOptions.__table__.columns.keys()]
session.query(Document, MenuOptions).filter(
Document.id == document_id
).join(
MenuOptions,
and_(
MenuOptions.menu_name == text('"document_type"'),
MenuOptions.value == Document.type_id
)
).with_entities(*entities)
I ended up taking a slightly different approach using association_proxy, but if you ended up here from google then this should help you. In the following example, I store a document_type_id in the document table and hold the corresponding values for that id in a table called menu_options. Normally you would use foreign keys for this, but our menu_options is essentially an inhouse lookup table, and it contains relationships to several other tables so foreign keys are not a clean solution.
By first establishing a relationship via the primaryjoin property, then using associationproxy, I can immediately load the document_type based on the document_type_id with the following code:
from sqlalchemy import and_
from sqlalchemy.ext.associationproxy import association_proxy
class Document(db.Model):
__tablename__ = "document"
document_type_id = db.Column(db.Integer)
document_type_proxy = db.relationship(
"MenuOptions",
primaryjoin=(
and_(
MenuOptions.menu_name=='document_type',
foreign('Document.document_type_id')==MenuOptions.value
)
),
lazy="immediate"
viewonly=True
)
If all you need is a mapped relationship without the use of foreign keys within your database, then this will do just fine. If, however, you want to be able to access the remote attribute (in this case the document_type) directly as an attribute on the initial class (in this case Document) then you can use association_proxy to do so by simply passing the name of the mapped relationship and the name of the remote property:
document_type = association_proxy("document_type_proxy", "document_type")
I want to implement models using inheritance and I've found this package django-polymorphic. But I was reading about inheritance in django models and almost on every page I found they recommend using abstract = True in parent model. Which will duplicate fields for subclasses, resultsing in making queries faster.
I've done some testing and found out that this library doesn't use use abstract varaible:
class Parent(PolymorphicModel):
parent_field = models.TextField()
class Child(Parent):
child_field = models.TextField()
This results in:
Parent table:
| app_parent| CREATE TABLE `app_parent` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`parent_field` longtext NOT NULL,
`polymorphic_ctype_id` int(11),
PRIMARY KEY (`id`),
KEY `app_polymorphic_ctype_id_a7b8d4c7_fk_django_content_type_id` (`polymorphic_ctype_id`),
CONSTRAINT `app_polymorphic_ctype_id_a7b8d4c7_fk_django_content_type_id` FOREIGN KEY (`polymorphic_ctype_id`) REFERENCES `django_content_type` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
Child table:
| app_child | CREATE TABLE `app_child` (
`parent_ptr_id` int(11) NOT NULL,
`child_field` varchar(20) NOT NULL,
PRIMARY KEY (`parent_ptr_id`),
CONSTRAINT `no_parent_ptr_id_079ccc0e_fk_app_parent_id` FOREIGN KEY (`parent_ptr_id`) REFERENCES `app_arent` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
Should I use my own classes which uses abstract field or should i stick with this?
Do you have a need to be able to query parent table?
Parent.objects.all()
If yes, then most probably you will need to use the multi-table inheritance with abstract=False.
Using model inheritance with abstract=False you get more complicated database schema, with more database relations. Creating a child instance will take 2 inserts instead of 1 (parent and child table). Querying child data will require table join. So this method certainly has got it's shortcomings. But when you want to query for common columns data, it's best supported way in django.
Django polymorphic builds on top of standard django model inheritance, by adding extra column polymorphic_ctype which allows to identify subclass having only a parent object.
There are various ways that you can use to achieve similar results using abstract=True. But often it results in more complicated querying code.
If using abstract=True bellow is 2 examples how you can query common-data of all childs.
Chaining multiple queries
def query_all_childs(**kwargs):
return chain(
Child1.objects.filter(**kwargs)
Child2.objects.filter(**kwargs)
)
Using database views
Manually create a database view that combines several tables (this could be done by attaching sql code to post-migrate signal):
create database view myapp_commonchild
select 'child1' as type, a, b from child1
union all
select 'child2' as type, a, b from child2
Create a concrete-parent model with managed=False. This flag tells django to ignore the table in database migrations (because we have manually created database view for that).
class Parent(models.Model):
a = CharField()
b = CharField()
class CommonChild(Parent):
type = models.CharField()
class Meta:
managed = False
class Child1(Parent):
pass
class Child2(Parent):
pass
Now you can query CommonChild.objects.all() and access common fields of child classes.
Speaking of performance, I don't know how big your tables are or how heavy reads/writes are, but most probably using abstract=False will not impact your performance in a noticeable way.
I've an already existing database and want to access it using SQLAlchemy. Because, the database structure's managed by another piece of code (Django ORM, actually) and I don't want to repeat myself, describing every table structure, I'm using autoload introspection. I'm stuck with a simple concrete table inheritance.
Payment FooPayment
+ id (PK) <----FK------+ payment_ptr_id (PK)
+ user_id + foo
+ amount
+ date
Here is the code, with table SQL descritions as docstrings:
class Payment(Base):
"""
CREATE TABLE payments(
id serial NOT NULL,
user_id integer NOT NULL,
amount numeric(11,2) NOT NULL,
date timestamp with time zone NOT NULL,
CONSTRAINT payment_pkey PRIMARY KEY (id),
CONSTRAINT payment_user_id_fkey FOREIGN KEY (user_id)
REFERENCES users (id) MATCH SIMPLE)
"""
__tablename__ = 'payments'
__table_args__ = {'autoload': True}
# user = relation(User)
class FooPayment(Payment):
"""
CREATE TABLE payments_foo(
payment_ptr_id integer NOT NULL,
foo integer NOT NULL,
CONSTRAINT payments_foo_pkey PRIMARY KEY (payment_ptr_id),
CONSTRAINT payments_foo_payment_ptr_id_fkey
FOREIGN KEY (payment_ptr_id)
REFERENCES payments (id) MATCH SIMPLE)
"""
__tablename__ = 'payments_foo'
__table_args__ = {'autoload': True}
__mapper_args__ = {'concrete': True}
The actual tables have additional columns, but this is completely irrelevant to the question, so in attempt to minimize the code I've simplified everything just to the core.
The problem is, when I run this:
payment = session.query(FooPayment).filter(Payment.amount >= 200.0).first()
print payment.date
The resulting SQL is meaningless (note the lack of join condidion):
SELECT payments_foo.payment_ptr_id AS payments_foo_payment_ptr_id,
... /* More `payments_foo' columns and NO columns from `payments' */
FROM payments_foo, payments
WHERE payments.amount >= 200.0 LIMIT 1 OFFSET 0
And when I'm trying to access payment.date I get the following error: Concrete Mapper|FooPayment|payments_foo does not implement attribute u'date' at the instance level.
I've tried adding implicit foreign key reference id = Column('payment_ptr_id', Integer, ForeignKey('payments_payment.id'), primary_key=True) to FooPayment without any success. Trying print session.query(Payment).first().user works (I've omited User class and commented the line) perfectly, so FK introspection works.
How can I perform a simple query on FooPayment and access Payment's values from resulting instance?
I'm using SQLAlchemy 0.5.3, PostgreSQL 8.3, psycopg2 and Python 2.5.2.
Thanks for any suggestions.
Your table structures are similar to what is used in joint table inheritance, but they certainly don't correspond to concrete table inheritance where all fields of parent class are duplicated in the table of subclass. Right now you have a subclass with less fields than parent and a reference to instance of parent class. Switch to joint table inheritance (and use FooPayment.amount in your condition or give up with inheritance in favor of simple aggregation (reference).
Filter by a field in other model doesn't automatically add join condition. Although it's obvious what condition should be used in join for your example, it's not possible to determine such condition in general. That's why you have to define relation property referring to Payment and use its has() method in filter to get proper join condition.