I'm struggling with implementing the concept of "scientific paper citation" in SQL.
I have a table of Papers. Each Paper can cite many other Papers and, vice-versa, it can be cited by many other more.
Here's the code I wrote
class Paper(SQLModel, table=True):
id: Optional[int] = Field(default=None, primary_key=True)
bibliography: List["Citation"] = Relationship(back_populates="citing")
cited_by: List["Citation"] = Relationship(back_populates="cited")
class Citation(SQLModel, table=True):
citing_id: Optional[int] = Field(default=None, primary_key=True, foreign_key="paper.id")
citing: "Paper" = Relationship(back_populates="bibliography")
cited_id: Optional[int] = Field(default=None, primary_key=True, foreign_key="paper.id")
cited: "Paper" = Relationship(back_populates="cited_by")
This is not working:
sqlalchemy.exc.AmbiguousForeignKeysError: Could not determine join condition between parent/child tables on relationship Paper.bibliography - there are multiple foreign key paths linking the tables. Specify the 'foreign_keys' argument, providing a list of those columns which should be counted as containing a foreign key reference to the parent table.
The problem is the fact that I wrote foreign_key="paper.id" twice, but I don't know how to fix it.
To reproduce the error:
I'm using Python 3.10.5;
the only dependency is sqlmodel.
from typing import List
from typing import Optional
from sqlmodel import create_engine
from sqlmodel import Field
from sqlmodel import Relationship
from sqlmodel import Session
from sqlmodel import SQLModel
sqlite_file_name = "database.db"
sqlite_url = f"sqlite:///{sqlite_file_name}"
engine = create_engine(sqlite_url, echo=True)
# class Paper(SQLModel, table=True): ...
# class Citation(SQLModel, table=True): ...
if __name__ == "__main__":
SQLModel.metadata.create_all(engine)
Paper()
I'm using SQLModel, but an answer in SQLAlchemy would be fine as well.
Handling multiple possible JOIN conditions in SQLAlchemy is documented here. The solution is to explicitly pass the foreign_keys argument to your RelationshipProperty constructor.
In this case, you will need to specify that for all four relationships in question.
Since SQLModel currently does not allow to directly pass all available relationship arguments to its constructor (though I am working on a PR for that), you need to utilize the sa_relationship_kwargs parameter.
Here is a working example:
from typing import Optional
from sqlmodel import Field, Relationship, SQLModel
class Paper(SQLModel, table=True):
id: Optional[int] = Field(default=None, primary_key=True)
bibliography: list["Citation"] = Relationship(
back_populates="citing",
sa_relationship_kwargs={"foreign_keys": "Citation.citing_id"},
)
cited_by: list["Citation"] = Relationship(
back_populates="cited",
sa_relationship_kwargs={"foreign_keys": "Citation.cited_id"},
)
class Citation(SQLModel, table=True):
citing_id: Optional[int] = Field(
default=None,
primary_key=True,
foreign_key="paper.id",
)
citing: Paper = Relationship(
back_populates="bibliography",
sa_relationship_kwargs={"foreign_keys": "Citation.citing_id"},
)
cited_id: Optional[int] = Field(
default=None,
primary_key=True,
foreign_key="paper.id",
)
cited: Paper = Relationship(
back_populates="cited_by",
sa_relationship_kwargs={"foreign_keys": "Citation.cited_id"},
)
As a side note, I think in this case it might be even nicer to use an association proxy to have an additional direct link from a paper to all papers it is cited by and citing (without the additional "hop" via the Citation object), but I believe this is currently not possible with SQLModel.
Related
I'm working on a FastAPI application and using SQLModel with a Postgres backend. I have Post objects, each of which can be upvoted by Users. We represent this with a PostUpvote many-to-many relation between Users and Posts. So far, so boring.
from datetime import datetime
from typing import List, Optional
from pydantic import BaseModel
from sqlmodel import Field, Relationship, SQLModel
import uuid as uuid_pkg
def uuid_hex():
return uuid_pkg.uuid4().hex
def PkIdField():
return Field(
default_factory=uuid_hex,
primary_key=True,
index=True,
nullable=False,
)
class PostBase(SQLModel):
title: str
description: str
class Post(PostBase, table=True):
creator_id: str = Field(foreign_key="amp_users.id")
id: str = PkIdField()
created_at: datetime = Field(default_factory=datetime.utcnow, nullable=False)
creator: User = Relationship(back_populates="posts")
upvotes: List["PostUpvote"] = Relationship(back_populates="post")
class UserBase(SQLModel):
email: str
class User(UserBase, table=True):
# "user" table is reserved by postgres
__tablename__ = "app_users"
id: str = PkIdField()
posts: List["Post"] = Relationship(back_populates="creator")
class PostUpvote(SQLModel, table=True):
post: Post = Relationship(back_populates="upvotes")
post_id: str = Field(foreign_key="posts.id", primary_key=True)
user_id: str = Field(foreign_key="app_users.id", primary_key=True)
As you can see, I've set up an upvotes relationship on my Post object, which will give me a list of all the upvotes for that post. But when I'm returning this to the frontend, I don't need or want a list of all the upvotes. I just need the count. Obviously, I can use len(post.updates) to get this, but that still requires us to fetch all the individual upvote objects for that post. So my question is, is there some way to add an upvote_count relationship to my Post object, like so:
class Post(PostBase, table=True):
creator_id: str = Field(foreign_key="amp_users.id")
id: str = PkIdField()
created_at: datetime = Field(default_factory=datetime.utcnow, nullable=False)
creator: User = Relationship(back_populates="posts")
upvotes: List["PostUpvote"] = Relationship(back_populates="post")
upvote_count: int = Relationship(...)
Note that this is using SQLModel's Relationship feature (https://sqlmodel.tiangolo.com/tutorial/relationship-attributes/), not SQLAlchemy relations (though I am running SQLAlchemy under the hood).
If there's some way to provide a custom SQLAlchemy query to the SQLModel relationship, that would solve the problem neatly. But I've not been able to find anything in the SQLModel docs about how to do so. Is this even possible? Or should I just resign myself to doing the query manually?
I am wondering if I could use sqlalchemy relationship() with Array of ForeignKey,
like the below examples :
from sqlalchemy import Column, Integer, String, ForeignKey
from sqlalchemy.types import ARRAY
from sqlalchemy.orm import relationship
from database import Base
class Topic(Base):
__tablename__ = 'topics'
id = Column(Integer, primary_key=True, index=True)
topic_name = Column(String)
class Room(Base):
__tablename__ = 'rooms'
id = Column(Integer, primary_key=True, index=True)
room_name = Column(String)
body = Column(String)
topics_id = Column(ARRAY(Integer, ForeignKey("topics.id")))
topics = relationship("Topic", foreign_keys='topics_id', uselist=True) #Single-direction
The relationship is that a Room has many of topics in it, but a Topic does not need to belong to any room.
Since the Room model already has a topics_id field that store array of foreign_keys of Topics model,
so I tried use this topics_id, passing into relationship()
Plus I do not want the Topic to has a link to Room model
but it give me this error
sqlalchemy.exc.InvalidRequestError: When initializing mapper mapped class Room->rooms, expression 'topics_id' failed to locate a name ("name 'topics_id' is not defined"). If this is a class name, consider adding this relationship() to the <class 'models.Room'> class after both dependent classes have been defined.
You can create a many-to-many relation for a schema like this. In the many-to-many relation, your topics can be a part of many rooms and every room can have multiple topics.
You can read about it here on the sqlalchemy official documentation
https://docs.sqlalchemy.org/en/14/orm/basic_relationships.html#many-to-many
I'm using sqlacodegen for reflecting a bunch of tables from my database.
And i'm getting the following error:
sqlalchemy.exc.AmbiguousForeignKeysError: Can't determine join between 'Employee' and 'Sales'; tables have more than one foreign key constraint relationship between them. Please specify the 'onclause' of this join explicitly.
Here's a simplified version of my tables.
I read in the documentation that I should use the foreign_keys parameter to resolve ambiguity between foreign key targets. Although, I think this problem is because of the inheritance. Could someone help me understand what is going on.
# coding: utf-8
from sqlalchemy import Column, ForeignKey, Integer
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import relationship
Base = declarative_base()
class Employee(Base):
__tablename__ = 'Employee'
EmployeeId = Column(Integer, primary_key=True)
class Sales(Employee):
__tablename__ = 'Sales'
EmployeeID = Column(ForeignKey('Employee.EmployeeId'), primary_key=True)
OldemployeeID = Column(ForeignKey('Employee.EmployeeId'))
employee = relationship('Employee', foreign_keys=[EmployeeID])
old_employee = relationship("Employee", foreign_keys=[OldemployeeID])
When your tables have multiple possible paths to inherit between them (Sales.EmployeeID or Sales.OldEmployeeID), SqlAlchemy doesn't know which one to use and you'll need to tell it the path explicitly, by using inherit_condition. For instance to inherit by EmployeeID:
class Sales(Employee):
...
__mapper_args__ = { "inherit_condition": EmployeeID == Employee.EmployeeId }
For the sake of example, you could also inherit by OldEmployeeID, by entering OldEmployeeID == Employee.EmployeeId - this would mean that both your Sales primary key and the Employee primary key are allowed to be different.
Just use backref and use Integer on both EmployeeID and OldemployeeID. Otherwise you will get an another error.
class Sales(Employee):
__tablename__ = 'Sales'
EmployeeID = Column(Integer, ForeignKey('Employee.EmployeeId'), primary_key=True)
OldemployeeID = Column(Integer, ForeignKey('Employee.EmployeeId'))
employee = relationship('Employee', foreign_keys=[EmployeeID], backref='Employee')
old_employee = relationship("Employee", foreign_keys=[OldemployeeID], backref='Employee')
related to this question: SQLAlchemy logging of changes with date and user
I'm using a modified version of the "recipe" for versioning changes automatically. I think it's able to handle some forms of relationships already (not sure, though), but I'm not able to handle the case where there's a many-to-many relationship in a separate table.
Here's a simple example that's an issue:
from history_meta import (Versioned, versioned_session)
Base = declarative_base()
user_to_group = Table('user_to_group', Base.metadata,
Column('user_login', String(60), ForeignKey('user.login')),
Column('group_name', String(100), ForeignKey('group.name'))
)
class User(Versioned, Base):
__tablename__ = 'user'
login = Column(String(60), primary_key=True, nullable=False)
password = Column(BINARY(20), nullable=False)
class Group(Versioned, Base):
__tablename__ = 'group'
name = Column(String(100), primary_key=True, nullable=False)
description = Column(String(100), nullable=True)
users = relationship(User, secondary=user_to_group, backref='groups')
When generating the tables in the database with Base.metadata.create_all(engine) I can see that there are only 5 tables: user, group, user_to_group, user_history, and group_history There is no user_to_group_history.
The "versioning" gets added to the declarative objects through inheritance of Versioned, but there's no way (that I can see) to do something similar with the user_to_group table which isn't using the declarative format. There's also notes in the documentation saying that it's not a good idea using a table that's mapped to a class so I'm trying to avoid using a declarative object for the relationship.
I am currently trying to create the following database schema with SQLAlchemy (using ext.declarative):
I have a base class MyBaseClass which provides some common functionality for all of my publicly accessible classes, a mixin class MetadataMixin that provides functionality to query metadata from imdb and store it.
Every class that subclasses MetadataMixin has a field persons which provides a M:N relationship to instances of the Person class, and a field persons_roles which provides a 1:N relationship to an object (one for each subclass) which stores the role a concrete Person plays in the instance of the subclass.
This is an abbreviated version of what my code looks like at the moment:
from sqlalchemy import Column, Integer, Enum, ForeignKey
from sqlalchemy.orm import relationship
from sqlalchemy.ext.associationproxy import association_proxy
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class MyBaseClass(object):
"""Base class for all publicly accessible classes"""
id = Column(Integer, primary_key=True)
class Person(MyBaseClass):
"""A Person"""
name = Column(Unicode)
movies = association_proxy('movie_roles', 'movie',
creator=lambda m: _PersonMovieRole(movie=m))
shows = association_proxy('show_roles', 'show',
creator=lambda s: _PersonShowRole(show=s=))
class _PersonMovieRole(Base):
"""Role for a Person in a Movie"""
__tablename__ = 'persons_movies'
id = Column(Integer, primary_key=True)
role = Column(Enum('none', 'actor', 'writer', 'director', 'producer'),
default='none')
person_id = Column(Integer, ForeignKey('persons.id'))
person = relationship('Person', backref='movie_roles')
movie_id = Column(Integer, ForeignKey('movies.id'))
movie = relationship('Movie', backref='persons_roles')
class _PersonShowRole(Base):
"""Role for a Person in a Show"""
__tablename__ = 'persons_shows'
id = Column(Integer, primary_key=True)
role = Column(Enum('none', 'actor', 'writer', 'director', 'producer'),
default='none')
person_id = Column(Integer, ForeignKey('persons.id'))
person = relationship('Person', backref='show_roles')
show_id = Column(Integer, ForeignKey('shows.id'))
show = relationship('Episode', backref='persons_roles')
class MetadataMixin(object):
"""Mixin class that provides metadata-fields and methods"""
# ...
persons = association_proxy('persons_roles', 'person',
creator= #...???...#)
class Movie(Base, MyBaseClass, MetadataMixin):
#....
pass
What I'm trying to do is to create a generic creator function for association_proxy that creates either a PersonMovieRole or a PersonShowRole object, depending on the class of the concrete instance that a Person is added to. What I'm stuck on at the moment is that I don't know how to pass the calling class to the creator function.
Is this possible, or is there maybe even an easier way for what I'm trying to accomplish?
By the time your persons field is defined, you cannot really know what class it will end up in. Python takes up ready dictionaries of class members and creates classes out of them (via type.__new__), but when it happens, those members are already fully defined.
So you need to provide the required information directly to the mixin, and tolerate the small duplication it will create in your code. I'd opt for interface similar to this one:
class Movie(Base, MyBaseClass, MetadataMixin('Movie')):
pass
(You cannot have MetadataMixin(Movie) either, for the exact same reasons: Movie requires its base classes to be completely defined by the time the class is created).
To implement such "parametrized class", simply use a function:
def MetadataMixin(cls_name):
"""Mixin class that provides metadata-fields and methods"""
person_role_cls_name = 'Person%sRole' % cls_name
person_role_cls = Base._decl_class_registry[person_role_cls_name]
class Mixin(object):
# ...
persons = association_proxy('persons_roles', 'person',
creator=person_role_cls)
return Mixin
This works because what we're looking up in Base._decl_class_registry - the registry of all classes descending from your declarative base - is not the final class (e.g. Movie), but the association object (e.g. PersonMovieRole).